High-level Path Operations Using pathlib Module In Python - GeekPython

The pathlib module is a part of Python’s standard library and allows us to interact with filesystem paths and work with files using various methods and properties on the Path object.

Getting Started With pathlib

The most frequently used class of the pathlib module is Path. It is better to kick off with the Path class if we are using this module for the first time or not sure which class to use for our task.

# Importing Path class from pathlib

from pathlib import Path

# Instantiating the Path

path = Path(__file__)

print(path)

In the above example, first, we imported the Path class from the pathlib module and then instantiated the Path with __file__.

This returns the absolute path to the current file, main.py, on which we are working.

1	D:\SACHIN\Pycharm\pathlib_module\main.py

The Path class instantiates the file’s concrete path for the operating system on which the user is working. Because we’re using Windows, we’ll get the following output if we print the type of path.

print(type(path))

----------

Before we get into the methods and properties of Path, it’s important to understand that the Path classes are divided into pure paths and concrete paths.

Pure Paths

Pure paths enable us to manipulate the file paths of another operating system, such as manipulating the Windows path on a Unix machine or vice versa without accessing the operating system.

Pure paths only support computational operations and do not support I/O operations such as reading, writing, or manipulating files.

Pathlib’s PurePath

PurePath is a class that is used to perform various operations on the path object. Consider the example below, in which we instantiate the PurePath() class.

# Importing PurePath class from pathlib

from pathlib import PurePath

path = PurePath('main.py')

print(path)

print(type(path))

----------

main.py

We got the PureWindowsPath() path when we ran the above code because we are on a Windows machine, if we were on a non-Windows machine, we would get the PurePosixPath() path.

The PurePath() has two subclasses, which are as follows:

PureWindowsPath()
PurePosixPath()

PureWindowsPath

This subclass is implemented for Windows filesystem paths, as the name suggests.

# Importing PureWindowsPath class from pathlib

from pathlib import PureWindowsPath

# Instantiating PureWindowsPath

path = PureWindowsPath('main.py')

print(path)

print(type(path))

----------

main.py

PurePosixPath

This subclass is used for non-Windows filesystem paths.

# Importing PurePosixPath class from pathlib

from pathlib import PurePosixPath

# Instantiating PurePosixPath

path = PurePosixPath('main.py')

print(path)

print(type(path))

----------

main.py

PurePath Methods And Properties

PurePath provides several methods that allow us to perform various operations on filesystem paths.

Getting the drive name

The PurePath.drive can be used to extract the drive name from the specified path. We’ll get a string representing the drive name, or an empty string if no drives are present in the path.

from pathlib import PurePath

# Path having a drive name

drive = PurePath('D:/SACHIN/Pycharm/test.py').drive

print(drive)

# Path without a drive name

no_drive = PurePath('/SACHIN/Pycharm/test.py').drive

print(no_drive)

----------

The first part of the code has a drive name in its path, which we got in the output, but the second part of the code did not, so we got an empty string.

Getting the root and stem

The root is the file path’s top-level directory, which we can access with PurePath.root, and the stem is the last component of the file path without the suffix, which we can access with PurePath.stem.

from pathlib import PureWindowsPath, PurePosixPath

# Getting the root

root = PureWindowsPath('D:/SACHIN/Pycharm/').root

print(root)

# Getting the root from the Unix-like path

u_root = PurePosixPath('/SACHIN/Pycharm/').root

print(u_root)

# Getting the stem

stem = PureWindowsPath('D:/SACHIN/Pycharm/test.py').stem

print(stem)

----------

test

Getting the ancestors of the path

The PurePath.parents can be used to access the logical ancestors of the path.

from pathlib import PureWindowsPath

ancestor = PureWindowsPath('D:/SACHIN/Pycharm/test.py')

print(ancestor.parents[0])

print(ancestor.parents[1])

print(ancestor.parents[-1])

----------

D:\SACHIN\Pycharm

D:\SACHIN

D:\

Using the slicing technique, we were able to access the path’s ancestors. Python 3.10 added support for slices and negative index values for PurePath.parents.

We got the full path except for the file name when we used 0, one directory back when we used 1, and the beginning portion of the path when we used -1.

Getting the parent

The PurePath.parent allows us to access the logical parent of the path.

from pathlib import PureWindowsPath

p = PureWindowsPath('D:/SACHIN/Pycharm/test.py')

print(p.parent)

----------

D:\SACHIN\Pycharm

In the above example, the parent directory of test.py is Pycharm/, the parent directory of Pycharm/ is SACHIN/, and the parent directory of SACHIN/ is the drive D:/, which contains all of these directories and files.

That’s why we got this output D:\SACHIN\Pycharm.

Getting the name and suffix

PurePath.name provides access to the name of the path’s final component, while PurePath.suffix provides access to the file extension of the final component. If the file has multiple extensions, we can get the list of file extensions with PurePath.suffixes.

from pathlib import PureWindowsPath

# Accessing the name of the final component of the path

file_name = PureWindowsPath('D:/SACHIN/Pycharm/test.py').name

print(file_name)

# Accessing the suffix of the final component of the path

file_suffix = PureWindowsPath('D:/SACHIN/Pycharm/test.py').suffix

print(file_suffix)

----------

test.py

.py

The last component of the path is test.py, and the extension is .py, which is what we got in the output.

What if our test.py file has extensions like test.py.zip? If we want to extract both extensions, we can use PurePath.suffixes.

# Accessing the multiple suffix

file_suffixes = PureWindowsPath('test.py.zip').suffixes

print(file_suffixes)

----------

['.py', '.zip']

Check if a path is absolute

The absolute path is one that has both a root and a drive(if the naming convention allows), and we can use the PurePath.is_absolute() method to determine whether or not a path is absolute. Returns a boolean value.

from pathlib import PureWindowsPath, PurePosixPath

print(PureWindowsPath('D:/SACHIN/').is_absolute())

True

print(PureWindowsPath('/SACHIN/').is_absolute())

False

print(PurePosixPath('/SACHIN/').is_absolute())

True

print(PurePosixPath('D:/SACHIN/').is_absolute())

False

Looking at the first two PureWindowsPath cases, we first get True because the path has both a drive and a root, but then we get False because the path lacks a drive.

In the PurePosixPath cases, we first got True even though the path did not have a drive name because non-Windows paths do not include drive names like Windows paths. But when we used the drive name in the path, we got False.

Combining paths

PurePath.joinpath() allows us to concatenate the path with the argument passed to it.

from pathlib import PurePath, PureWindowsPath, PurePosixPath

print(PurePath('D:/SACHIN/').joinpath('test.txt'))

D:\SACHIN\test.txt

print(PurePath('D:/SACHIN/').joinpath(PureWindowsPath('test_dir', 'test.txt')))

D:\SACHIN\test_dir\test.txt

print(PurePosixPath('/SACHIN/').joinpath(PurePosixPath('test_dir', 'test.txt')))

/SACHIN/test_dir/test.txt

Matching the path

PurePath.match() takes a pattern and matches the path against the provided pattern(glob style pattern). When the path is matched, it returns True, otherwise, it returns False.

from pathlib import PureWindowsPath

p = PureWindowsPath('D:/SACHIN/test.txt').match('*.py')

print(p)

p1 = PureWindowsPath('D:/SACHIN/test.py').match('*.py')

print(p1)

p2 = PureWindowsPath('D:/SACHIN/test/test.py').match('test/*.py')

print(p2)

----------

False

True

Depending on the platform we’re working on, pattern matching can be case-sensitive.

from pathlib import PureWindowsPath, PurePosixPath

print(PurePosixPath('/test/test.py').match('*.Py'))

print(PureWindowsPath('/test/test.py').match('*.Py'))

----------

False

True

Changing the name

PurePath.with_name() accepts a name argument and returns the new path with the changed file name.

from pathlib import PureWindowsPath

chg_name = PureWindowsPath('D:/SACHIN/test.txt').with_name('test.py')

print(chg_name)

----------

D:\SACHIN\test.py

If there is no name in the path, then we’ll get a ValueError.

no_name = PureWindowsPath('D:/').with_name('test.py')

print(no_name)

----------

Traceback (most recent call last):

....

raise ValueError("%r has an empty name" % (self,))

ValueError: PureWindowsPath('D:/') has an empty name

Changing the stem

The PurePath.with_stem() method creates a new path with a different stem.

from pathlib import PureWindowsPath

chg_stem = PureWindowsPath('D:/SACHIN/example.py').with_stem('test')

print(chg_stem)

----------

D:\SACHIN\test.py

The ValueError is thrown if the path does not have a name.

no_name = PureWindowsPath('D:/').with_stem('test')

print(no_name)

----------

Traceback (most recent call last):

....

raise ValueError("%r has an empty name" % (self,))

ValueError: PureWindowsPath('D:/') has an empty name

Changing the suffix

We can change the suffix using PurePath.with_suffix(). If the file name lacks a suffix, the provided suffix will be appended.

from pathlib import PureWindowsPath

suf = PureWindowsPath('D:/SACHIN/test.py').with_suffix('.txt')

print(suf)

no_suf = PureWindowsPath('D:/SACHIN/test').with_suffix('.py')

print(no_suf)

----------

D:\SACHIN\test.txt

D:\SACHIN\test.py

What happens if we supply an empty string? The file’s suffix will be removed.

empty_suf = PureWindowsPath('D:/SACHIN/test.py').with_suffix('')

print(empty_suf)

----------

D:\SACHIN\test

Concrete Paths

Concrete paths perform computational operations in addition to I/O operations on filesystem paths. Unlike pure paths, we could use concrete paths to perform operations such as reading the file, writing data to the file, and even interacting with the files.

We can make system calls on path objects thanks to concrete paths. Concrete paths are subclasses of pure path classes, and there are three ways to instantiate concrete paths:

Path()
WindowsPath()
PosixPath()

Pathlib’s Path

At the beginning of the article, we saw a glimpse of the Path class, which is a subclass of the PurePath class that represents the concrete path of the filesystem path.

When we instantiate the Path() class, it generates either PosixPath or WindowsPath object, depending on the machine we’re working on.

# Importing Path class from pathlib

from pathlib import Path

# Instantiating the Path

path = Path('D:/SACHIN/Pycharm')

print(path)

print(type(path))

----------

D:\SACHIN\Pycharm

The Path() created a concrete Windows path because we’re on a Windows machine.

PosixPath

PosixPath is a subclass of PurePosixPath and Path class that represents concrete non-Windows filesystem paths.

Because PosixPath will make system calls, we can’t instantiate it on our machine because it’s running on Windows.

# Importing PosixPath class from pathlib

from pathlib import PosixPath

# Instantiating PosixPath

path = PosixPath('main.py')

print(path)

----------

Traceback (most recent call last):

....

raise NotImplementedError("cannot instantiate %r on your system"

NotImplementedError: cannot instantiate 'PosixPath' on your system

We can only instantiate the class that corresponds to our system, for example, we can instantiate the WindowsPath class on Windows machines and the PosixPath class on POSIX-compliant machines.

WindowsPath

WindowsPath is a subclass of PureWindowsPath and Path class that represents concrete Windows filesystem paths.

# Importing WindowsPath class from pathlib

from pathlib import WindowsPath

# Instantiating WindowsPath

path = WindowsPath('main.py')

print(path)

print(type(path))

----------

main.py

Path Methods

The Path class provides several methods for performing I/O operations on filesystem paths by interacting with the operating system.

Getting the current working directory and home directory

You may have used os.getcwd() to get the current working directory, Path.cwd() does the same thing, returning the new path object of the current working directory.

# Importing Path class from pathlib

from pathlib import Path

# Getting the current working directory

path = Path.cwd()

print(path)

----------

D:\SACHIN\Pycharm\pathlib_module

We obtained the path to our current working file, and we can see that the path separator is a backslash(\) because we are using the Windows operating system.

Path.home() returns the path to the user’s home directory. If the home directory cannot be resolved, a RuntimeError is thrown.

# Importing Path class from pathlib

from pathlib import Path

# Getting the home directory

path = Path.home()

print(path)

----------

C:\Users\SACHIN

Accessing the components of the path

We’ve seen the PurePath properties that help us access the path’s components, since, Path is a subclass of PurePath, we can use those properties with the Path class as well.

# Importing Path class from pathlib

from pathlib import Path

# Instantiating the path

path = Path('D:/SACHIN/test.py')

# Accessing the drive name

print(path.drive)

# Accessing the root

print(path.root)

# Accessing the name

print(path.name)

test.py

# Accessing the stem

print(path.stem)

test

# Accessing the suffix

print(path.suffix)

.py

# Accesing the parent

print(path.parent)

D:\SACHIN

Iterating the directories

Using Path.iterdir(), we can get the path objects of the contents of the specified directory.

from pathlib import Path

path = Path('D:/SACHIN/Pycharm/pathlib_module')

# Iterating the pathlib_module directory

for files in path.iterdir():

print(files)

----------

D:\SACHIN\Pycharm\pathlib_module\.idea

D:\SACHIN\Pycharm\pathlib_module\files

D:\SACHIN\Pycharm\pathlib_module\main.py

D:\SACHIN\Pycharm\pathlib_module\test.py

The path in the above code points to the pathlib_module directory, and we obtained the path objects of the directories and files contained within pathlib_module.

Here is another example of the .iterdir() method.

path = Path('files')

for files in path.iterdir():

print(files)

----------

files\example.md

files\file.py

files\test.txt

We iterated through the contents of the files directory, which is located in the current working directory.

Filesystem Modification

Creating a directory

Path.mkdir() creates a new directory at the specified path with the default mode=0o777, which means the directory is accessible to all users and groups and has read, write, and execute permissions.

from pathlib import Path

# Creating a new dir at the specified path

path = Path('D:/SACHIN/Pycharm/pathlib_module/new_dir').mkdir(mode=0o777)

When we execute the above code, a new directory called new_dir is created in the pathlib_module directory.

If the path already exists, we will receive a FileExistsError. If we run the above code again, we’ll get the following result.

1	FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'D:\\SACHIN\\Pycharm\\pathlib_module\\new_dir'

The path we specified already exists which is why the directory is not created. However, .mkdir() has an exist_ok parameter that, when set to True, ignores the error.

path = Path('D:/SACHIN/Pycharm/pathlib_module/new_dir').mkdir(exist_ok=True)

print('Directory created.')

----------

Directory created.

Note: The path’s final component should not be the existing non-directory file.

Creating a file

Path.touch() allows us to create a file with mode=0o666 at the specified path, indicating that the file has read and write permissions for all users and groups but no executable permission. The exist_ok parameter defaults to True.

from pathlib import Path

# Creating a new file at the specified path

path = Path('D:/SACHIN/Pycharm/pathlib_module/sample.txt').touch()

A file called sample.txt will be created. We’ll get the FileExistError if we set exist_ok=False and run the code again.

# Creating a new file at the specified path

path = Path('D:/SACHIN/Pycharm/pathlib_module/sample.txt').touch(exist_ok=False)

----------

FileExistsError: [Errno 17] File exists: 'D:\\SACHIN\\Pycharm\\pathlib_module\\sample.txt'

Renaming the files and directories

Methods like .with_name and .with_stem enable us to rename the file name of the specified path. To rename the files and directories, we can also use Path.rename().

from pathlib import Path

path = Path('files')

# Renaming the directory

path.rename('docs')

print('Directory renamed successfully.')

----------

Directory renamed successfully.

The directory files will be renamed to the docs. What happens if the target file or directory name already exists? The code will raise a FileExistsError.

path = Path('docs')

# Renaming the directory

path.rename('files')

print('Directory renamed successfully.')

----------

FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'docs' -> 'files'

The above code threw an error because the directory named files already exist in the project directory.

Removing the directory

Path.rmdir() deletes the directory specified in the path, but only if it is empty, otherwise, an OSError is raised.

from pathlib import Path

# Removing the directory at the specified path

path = Path('D:/SACHIN/Pycharm/pathlib_module/files').rmdir()

If we attempt to remove a directory that does not exist, we will receive a FileNotFoundError.

path = Path('D:/SACHIN/Pycharm/pathlib_module/files').rmdir()

----------

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'D:\\SACHIN\\Pycharm\\pathlib_module\\files'

Reading and Writing Operations

Path class provides several methods to perform reading and writing operations on the file. Assume we have a text file with some data and we want to read and write that data.

Opening the file

Before reading or writing data to the file, the Path class provides a .open() method that opens the file specified by the path. You may have already used the built-in open(), this method works in the same way.

from pathlib import Path

# Instantiating the path of the file

open_file = Path('sample_file.txt')

# Using the open() method

with open_file.open(mode='r') as file:

# Reading the content

print(file.read())

----------

Hi, I am sample file for testing.

Reading the file

To read the content of the file specified by the path, we can use the Path.read_text() method.

from pathlib import Path

path = Path('sample_file.txt').read_text(encoding='utf-8')

print(path)

----------

Hi, I am sample file for testing.

Writing data to the file

The Path class provides a .write_text() method for writing text data to a file.

from pathlib import Path

# Instantiating the path of the file

path = Path('sample_file.txt')

# Writing data to the file

path.write_text('Hello from GeekPython.')

# Reading the data

print(path.read_text())

----------

Hello from GeekPython.

Similarly, we can use the Path.write_bytes() method to write binary data to a file. It opens the file in binary mode.

# Instantiating the path of the file

path = Path('sample_file.txt')

# Writing binary data to the file

path.write_bytes(b'Hello from GeekPython.')

# Reading the binary data

print(path.read_bytes())

----------

b'Hello from GeekPython.'

We wrote the binary data to the sample_file.txt but if we look at the code, we read the file content using the .read_bytes() method.

Path.read_bytes() opens the file in binary mode and returns the contents of the file as a byte string.

Conclusion

The pathlib module provides high-level classes for manipulating file paths. These classes can be used to perform various operations on file paths as well as interact with files to perform I/O operations.

Let’s recall what we’ve learned: