The pathlib
module is a part of Python’s standard library and allows us to interact with filesystem paths and work with files using various methods and properties on the Path
object.
Getting Started With pathlib
The most frequently used class of the pathlib
module is Path
. It is better to kick off with the Path
class if we are using this module for the first time or not sure which class to use for our task.
1 2 3 4 5 6 |
# Importing Path class from pathlib from pathlib import Path # Instantiating the Path path = Path(__file__) print(path) |
In the above example, first, we imported the Path
class from the pathlib
module and then instantiated the Path
with __file__
.
This returns the absolute path to the current file, main.py
, on which we are working.
1 |
D:\SACHIN\Pycharm\pathlib_module\main.py |
The Path
class instantiates the file’s concrete path for the operating system on which the user is working. Because we’re using Windows, we’ll get the following output if we print the type of path
.
1 2 3 4 |
print(type(path)) ---------- <class 'pathlib.WindowsPath'> |
Before we get into the methods and properties of Path
, it’s important to understand that the Path
classes are divided into pure paths and concrete paths.
Pure Paths
Pure paths enable us to manipulate the file paths of another operating system, such as manipulating the Windows path on a Unix machine or vice versa without accessing the operating system.
Pure paths only support computational operations and do not support I/O operations such as reading, writing, or manipulating files.
Pathlib’s PurePath
PurePath
is a class that is used to perform various operations on the path object. Consider the example below, in which we instantiate the PurePath()
class.
1 2 3 4 5 6 7 8 9 10 |
# Importing PurePath class from pathlib from pathlib import PurePath path = PurePath('main.py') print(path) print(type(path)) ---------- main.py <class 'pathlib.PureWindowsPath'> |
We got the PureWindowsPath()
path when we ran the above code because we are on a Windows machine, if we were on a non-Windows machine, we would get the PurePosixPath()
path.
The PurePath()
has two subclasses, which are as follows:
PureWindowsPath()
PurePosixPath()
PureWindowsPath
This subclass is implemented for Windows filesystem paths, as the name suggests.
1 2 3 4 5 6 7 8 9 10 11 |
# Importing PureWindowsPath class from pathlib from pathlib import PureWindowsPath # Instantiating PureWindowsPath path = PureWindowsPath('main.py') print(path) print(type(path)) ---------- main.py <class 'pathlib.PureWindowsPath'> |
PurePosixPath
This subclass is used for non-Windows filesystem paths.
1 2 3 4 5 6 7 8 9 10 11 |
# Importing PurePosixPath class from pathlib from pathlib import PurePosixPath # Instantiating PurePosixPath path = PurePosixPath('main.py') print(path) print(type(path)) ---------- main.py <class 'pathlib.PurePosixPath'> |
PurePath Methods And Properties
PurePath
provides several methods that allow us to perform various operations on filesystem paths.
Getting the drive name
The PurePath.drive
can be used to extract the drive name from the specified path. We’ll get a string representing the drive name, or an empty string if no drives are present in the path.
1 2 3 4 5 6 7 8 9 10 11 12 |
from pathlib import PurePath # Path having a drive name drive = PurePath('D:/SACHIN/Pycharm/test.py').drive print(drive) # Path without a drive name no_drive = PurePath('/SACHIN/Pycharm/test.py').drive print(no_drive) ---------- D: |
The first part of the code has a drive name in its path, which we got in the output, but the second part of the code did not, so we got an empty string.
Getting the root and stem
The root is the file path’s top-level directory, which we can access with PurePath.root
, and the stem is the last component of the file path without the suffix, which we can access with PurePath.stem
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
from pathlib import PureWindowsPath, PurePosixPath # Getting the root root = PureWindowsPath('D:/SACHIN/Pycharm/').root print(root) # Getting the root from the Unix-like path u_root = PurePosixPath('/SACHIN/Pycharm/').root print(u_root) # Getting the stem stem = PureWindowsPath('D:/SACHIN/Pycharm/test.py').stem print(stem) ---------- \ / test |
Getting the ancestors of the path
The PurePath.parents
can be used to access the logical ancestors of the path.
1 2 3 4 5 6 7 8 9 10 11 |
from pathlib import PureWindowsPath ancestor = PureWindowsPath('D:/SACHIN/Pycharm/test.py') print(ancestor.parents[0]) print(ancestor.parents[1]) print(ancestor.parents[-1]) ---------- D:\SACHIN\Pycharm D:\SACHIN D:\ |
Using the slicing technique, we were able to access the path’s ancestors. Python 3.10 added support for slices and negative index values for PurePath.parents
.
We got the full path except for the file name when we used 0, one directory back when we used 1, and the beginning portion of the path when we used -1.
Getting the parent
The PurePath.parent
allows us to access the logical parent of the path.
1 2 3 4 5 6 7 |
from pathlib import PureWindowsPath p = PureWindowsPath('D:/SACHIN/Pycharm/test.py') print(p.parent) ---------- D:\SACHIN\Pycharm |
In the above example, the parent directory of test.py
is Pycharm/
, the parent directory of Pycharm/
is SACHIN/
, and the parent directory of SACHIN/
is the drive D:/
, which contains all of these directories and files.
That’s why we got this output D:\SACHIN\Pycharm
.
Getting the name and suffix
PurePath.name
provides access to the name of the path’s final component, while PurePath.suffix
provides access to the file extension of the final component. If the file has multiple extensions, we can get the list of file extensions with PurePath.suffixes
.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from pathlib import PureWindowsPath # Accessing the name of the final component of the path file_name = PureWindowsPath('D:/SACHIN/Pycharm/test.py').name print(file_name) # Accessing the suffix of the final component of the path file_suffix = PureWindowsPath('D:/SACHIN/Pycharm/test.py').suffix print(file_suffix) ---------- test.py .py |
The last component of the path is test.py
, and the extension is .py
, which is what we got in the output.
What if our test.py
file has extensions like test.py.zip
? If we want to extract both extensions, we can use PurePath.suffixes
.
1 2 3 4 5 6 |
# Accessing the multiple suffix file_suffixes = PureWindowsPath('test.py.zip').suffixes print(file_suffixes) ---------- ['.py', '.zip'] |
Check if a path is absolute
The absolute path is one that has both a root and a drive(if the naming convention allows), and we can use the PurePath.is_absolute()
method to determine whether or not a path is absolute. Returns a boolean value.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from pathlib import PureWindowsPath, PurePosixPath print(PureWindowsPath('D:/SACHIN/').is_absolute()) True print(PureWindowsPath('/SACHIN/').is_absolute()) False print(PurePosixPath('/SACHIN/').is_absolute()) True print(PurePosixPath('D:/SACHIN/').is_absolute()) False |
Looking at the first two PureWindowsPath cases, we first get True
because the path has both a drive and a root, but then we get False
because the path lacks a drive.
In the PurePosixPath cases, we first got True
even though the path did not have a drive name because non-Windows paths do not include drive names like Windows paths. But when we used the drive name in the path, we got False
.
Combining paths
PurePath.joinpath()
allows us to concatenate the path with the argument passed to it.
1 2 3 4 5 6 7 8 9 10 |
from pathlib import PurePath, PureWindowsPath, PurePosixPath print(PurePath('D:/SACHIN/').joinpath('test.txt')) D:\SACHIN\test.txt print(PurePath('D:/SACHIN/').joinpath(PureWindowsPath('test_dir', 'test.txt'))) D:\SACHIN\test_dir\test.txt print(PurePosixPath('/SACHIN/').joinpath(PurePosixPath('test_dir', 'test.txt'))) /SACHIN/test_dir/test.txt |
Matching the path
PurePath.match()
takes a pattern and matches the path against the provided pattern(glob style pattern). When the path is matched, it returns True, otherwise, it returns False.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
from pathlib import PureWindowsPath p = PureWindowsPath('D:/SACHIN/test.txt').match('*.py') print(p) p1 = PureWindowsPath('D:/SACHIN/test.py').match('*.py') print(p1) p2 = PureWindowsPath('D:/SACHIN/test/test.py').match('test/*.py') print(p2) ---------- False True True |
Depending on the platform we’re working on, pattern matching can be case-sensitive.
1 2 3 4 5 6 7 8 |
from pathlib import PureWindowsPath, PurePosixPath print(PurePosixPath('/test/test.py').match('*.Py')) print(PureWindowsPath('/test/test.py').match('*.Py')) ---------- False True |
Changing the name
PurePath.with_name()
accepts a name
argument and returns the new path with the changed file name.
1 2 3 4 5 6 7 |
from pathlib import PureWindowsPath chg_name = PureWindowsPath('D:/SACHIN/test.txt').with_name('test.py') print(chg_name) ---------- D:\SACHIN\test.py |
If there is no name in the path, then we’ll get a ValueError
.
1 2 3 4 5 6 7 8 |
no_name = PureWindowsPath('D:/').with_name('test.py') print(no_name) ---------- Traceback (most recent call last): .... raise ValueError("%r has an empty name" % (self,)) ValueError: PureWindowsPath('D:/') has an empty name |
Changing the stem
The PurePath.with_stem()
method creates a new path with a different stem.
1 2 3 4 5 6 7 |
from pathlib import PureWindowsPath chg_stem = PureWindowsPath('D:/SACHIN/example.py').with_stem('test') print(chg_stem) ---------- D:\SACHIN\test.py |
The ValueError
is thrown if the path does not have a name.
1 2 3 4 5 6 7 8 |
no_name = PureWindowsPath('D:/').with_stem('test') print(no_name) ---------- Traceback (most recent call last): .... raise ValueError("%r has an empty name" % (self,)) ValueError: PureWindowsPath('D:/') has an empty name |
Changing the suffix
We can change the suffix using PurePath.with_suffix()
. If the file name lacks a suffix, the provided suffix will be appended.
1 2 3 4 5 6 7 8 9 10 11 |
from pathlib import PureWindowsPath suf = PureWindowsPath('D:/SACHIN/test.py').with_suffix('.txt') print(suf) no_suf = PureWindowsPath('D:/SACHIN/test').with_suffix('.py') print(no_suf) ---------- D:\SACHIN\test.txt D:\SACHIN\test.py |
What happens if we supply an empty string? The file’s suffix will be removed.
1 2 3 4 5 |
empty_suf = PureWindowsPath('D:/SACHIN/test.py').with_suffix('') print(empty_suf) ---------- D:\SACHIN\test |
Concrete Paths
Concrete paths perform computational operations in addition to I/O operations on filesystem paths. Unlike pure paths, we could use concrete paths to perform operations such as reading the file, writing data to the file, and even interacting with the files.
We can make system calls on path objects thanks to concrete paths. Concrete paths are subclasses of pure path classes, and there are three ways to instantiate concrete paths:
Path()
WindowsPath()
PosixPath()
Pathlib’s Path
At the beginning of the article, we saw a glimpse of the Path
class, which is a subclass of the PurePath
class that represents the concrete path of the filesystem path.
When we instantiate the Path()
class, it generates either PosixPath
or WindowsPath
object, depending on the machine we’re working on.
1 2 3 4 5 6 7 8 9 10 11 |
# Importing Path class from pathlib from pathlib import Path # Instantiating the Path path = Path('D:/SACHIN/Pycharm') print(path) print(type(path)) ---------- D:\SACHIN\Pycharm <class 'pathlib.WindowsPath'> |
The Path()
created a concrete Windows path because we’re on a Windows machine.
PosixPath
PosixPath
is a subclass of PurePosixPath
and Path
class that represents concrete non-Windows filesystem paths.
Because PosixPath
will make system calls, we can’t instantiate it on our machine because it’s running on Windows.
1 2 3 4 5 6 7 8 9 10 11 12 |
# Importing PosixPath class from pathlib from pathlib import PosixPath # Instantiating PosixPath path = PosixPath('main.py') print(path) ---------- Traceback (most recent call last): .... raise NotImplementedError("cannot instantiate %r on your system" NotImplementedError: cannot instantiate 'PosixPath' on your system |
We can only instantiate the class that corresponds to our system, for example, we can instantiate the WindowsPath
class on Windows machines and the PosixPath
class on POSIX-compliant machines.
WindowsPath
WindowsPath
is a subclass of PureWindowsPath
and Path
class that represents concrete Windows filesystem paths.
1 2 3 4 5 6 7 8 9 10 11 |
# Importing WindowsPath class from pathlib from pathlib import WindowsPath # Instantiating WindowsPath path = WindowsPath('main.py') print(path) print(type(path)) ---------- main.py <class 'pathlib.WindowsPath'> |
Path Methods
The Path
class provides several methods for performing I/O operations on filesystem paths by interacting with the operating system.
Getting the current working directory and home directory
You may have used os.getcwd()
to get the current working directory, Path.cwd()
does the same thing, returning the new path object of the current working directory.
1 2 3 4 5 6 7 8 9 |
# Importing Path class from pathlib from pathlib import Path # Getting the current working directory path = Path.cwd() print(path) ---------- D:\SACHIN\Pycharm\pathlib_module |
We obtained the path to our current working file, and we can see that the path separator is a backslash(\
) because we are using the Windows operating system.
Path.home()
returns the path to the user’s home directory. If the home directory cannot be resolved, a RuntimeError
is thrown.
1 2 3 4 5 6 7 8 9 |
# Importing Path class from pathlib from pathlib import Path # Getting the home directory path = Path.home() print(path) ---------- C:\Users\SACHIN |
Accessing the components of the path
We’ve seen the PurePath
properties that help us access the path’s components, since, Path
is a subclass of PurePath
, we can use those properties with the Path
class as well.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# Importing Path class from pathlib from pathlib import Path # Instantiating the path path = Path('D:/SACHIN/test.py') # Accessing the drive name print(path.drive) D: # Accessing the root print(path.root) \ # Accessing the name print(path.name) test.py # Accessing the stem print(path.stem) test # Accessing the suffix print(path.suffix) .py # Accesing the parent print(path.parent) D:\SACHIN |
Iterating the directories
Using Path.iterdir()
, we can get the path objects of the contents of the specified directory.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from pathlib import Path path = Path('D:/SACHIN/Pycharm/pathlib_module') # Iterating the pathlib_module directory for files in path.iterdir(): print(files) ---------- D:\SACHIN\Pycharm\pathlib_module\.idea D:\SACHIN\Pycharm\pathlib_module\files D:\SACHIN\Pycharm\pathlib_module\main.py D:\SACHIN\Pycharm\pathlib_module\test.py |
The path in the above code points to the pathlib_module
directory, and we obtained the path objects of the directories and files contained within pathlib_module
.
Here is another example of the .iterdir()
method.
1 2 3 4 5 6 7 8 9 |
path = Path('files') for files in path.iterdir(): print(files) ---------- files\example.md files\file.py files\test.txt |
We iterated through the contents of the files
directory, which is located in the current working directory.
Filesystem Modification
Creating a directory
Path.mkdir()
creates a new directory at the specified path with the default mode=0o777
, which means the directory is accessible to all users and groups and has read, write, and execute permissions.
1 2 3 4 |
from pathlib import Path # Creating a new dir at the specified path path = Path('D:/SACHIN/Pycharm/pathlib_module/new_dir').mkdir(mode=0o777) |
When we execute the above code, a new directory called new_dir
is created in the pathlib_module
directory.
If the path already exists, we will receive a FileExistsError
. If we run the above code again, we’ll get the following result.
1 |
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'D:\\SACHIN\\Pycharm\\pathlib_module\\new_dir' |
The path we specified already exists which is why the directory is not created. However, .mkdir()
has an exist_ok
parameter that, when set to True
, ignores the error.
1 2 3 4 5 |
path = Path('D:/SACHIN/Pycharm/pathlib_module/new_dir').mkdir(exist_ok=True) print('Directory created.') ---------- Directory created. |
Note: The path’s final component should not be the existing non-directory file.
Creating a file
Path.touch()
allows us to create a file with mode=0o666
at the specified path, indicating that the file has read and write permissions for all users and groups but no executable permission. The exist_ok
parameter defaults to True
.
1 2 3 4 |
from pathlib import Path # Creating a new file at the specified path path = Path('D:/SACHIN/Pycharm/pathlib_module/sample.txt').touch() |
A file called sample.txt
will be created. We’ll get the FileExistError
if we set exist_ok=False
and run the code again.
1 2 3 4 5 |
# Creating a new file at the specified path path = Path('D:/SACHIN/Pycharm/pathlib_module/sample.txt').touch(exist_ok=False) ---------- FileExistsError: [Errno 17] File exists: 'D:\\SACHIN\\Pycharm\\pathlib_module\\sample.txt' |
Renaming the files and directories
Methods like .with_name
and .with_stem
enable us to rename the file name of the specified path. To rename the files and directories, we can also use Path.rename()
.
1 2 3 4 5 6 7 8 9 10 |
from pathlib import Path path = Path('files') # Renaming the directory path.rename('docs') print('Directory renamed successfully.') ---------- Directory renamed successfully. |
The directory files
will be renamed to the docs
. What happens if the target file or directory name already exists? The code will raise a FileExistsError
.
1 2 3 4 5 6 7 8 |
path = Path('docs') # Renaming the directory path.rename('files') print('Directory renamed successfully.') ---------- FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'docs' -> 'files' |
The above code threw an error because the directory named files
already exist in the project directory.
Removing the directory
Path.rmdir()
deletes the directory specified in the path, but only if it is empty, otherwise, an OSError
is raised.
1 2 3 4 |
from pathlib import Path # Removing the directory at the specified path path = Path('D:/SACHIN/Pycharm/pathlib_module/files').rmdir() |
If we attempt to remove a directory that does not exist, we will receive a FileNotFoundError
.
1 2 3 4 |
path = Path('D:/SACHIN/Pycharm/pathlib_module/files').rmdir() ---------- FileNotFoundError: [WinError 2] The system cannot find the file specified: 'D:\\SACHIN\\Pycharm\\pathlib_module\\files' |
Reading and Writing Operations
Path
class provides several methods to perform reading and writing operations on the file. Assume we have a text file with some data and we want to read and write that data.
Opening the file
Before reading or writing data to the file, the Path
class provides a .open()
method that opens the file specified by the path. You may have already used the built-in open()
, this method works in the same way.
1 2 3 4 5 6 7 8 9 10 11 12 |
from pathlib import Path # Instantiating the path of the file open_file = Path('sample_file.txt') # Using the open() method with open_file.open(mode='r') as file: # Reading the content print(file.read()) ---------- Hi, I am sample file for testing. |
Reading the file
To read the content of the file specified by the path, we can use the Path.read_text()
method.
1 2 3 4 5 6 7 |
from pathlib import Path path = Path('sample_file.txt').read_text(encoding='utf-8') print(path) ---------- Hi, I am sample file for testing. |
Writing data to the file
The Path
class provides a .write_text()
method for writing text data to a file.
1 2 3 4 5 6 7 8 9 10 11 12 |
from pathlib import Path # Instantiating the path of the file path = Path('sample_file.txt') # Writing data to the file path.write_text('Hello from GeekPython.') # Reading the data print(path.read_text()) ---------- Hello from GeekPython. |
Similarly, we can use the Path.write_bytes()
method to write binary data to a file. It opens the file in binary mode.
1 2 3 4 5 6 7 8 9 |
# Instantiating the path of the file path = Path('sample_file.txt') # Writing binary data to the file path.write_bytes(b'Hello from GeekPython.') # Reading the binary data print(path.read_bytes()) ---------- b'Hello from GeekPython.' |
We wrote the binary data to the sample_file.txt
but if we look at the code, we read the file content using the .read_bytes()
method.
Path.read_bytes()
opens the file in binary mode and returns the contents of the file as a byte string.
Conclusion
The pathlib
module provides high-level classes for manipulating file paths. These classes can be used to perform various operations on file paths as well as interact with files to perform I/O operations.
Let’s recall what we’ve learned:
- Pure path and Concrete path classes
- Path operations using the
PurePath
class Path
class for instantiating concrete paths- Methods of the
Path
class - Reading and writing files
- Modifying the filesystem
Reference – docs.python.org/3/library/pathlib.html
πOther articles you might be interested in if you liked this one
β Perform high-level file operation using shutil module in Python.
β Read and write zip files without extracting them in Python.
β File handling in Python – Open, read, and write.
β Generate temporary files and directories using tempfile module in Python.
β What is the difference between seek() and tell() in Python?
β A comprehensive guide to context manager and with statement in Python.
β Open and read multiple files simultaneously using with statement in Python.
That’s all for now
Keep Codingββ