The fileinput
module is a part of the standard library and is used when someone needs to iterate the contents of multiple files simultaneously. Well, Python’s in-built open()
function can also be used for iterating the content but for only one file at a time.
You’ll explore the classes and functions provided by the fileinput
module to iterate over multiple files.
But one thing, you could use fileinput
to iterate the single file also, but it would be better to use the open()
function for it.
Basic Usage
1 2 3 4 5 6 7 8 9 10 |
import fileinput # Creating fileinput instance and passing multiple files stream = fileinput.input(files=('test.txt', 'sample.txt', 'index.html')) # Iterating the content for data in stream: print(data) |
The fileinput
module was first imported, and then the fileinput
instance was created by calling fileinput.input()
and passing the tuple of files (test.txt
, sample.txt
, and index.html
). This will result in the return of an iterator.
The contents of the files were then iterated and printed using the for
loop.
1 2 3 4 5 6 7 8 9 10 |
Hi, I am a test file. Hi, I am a sample file for testing. <html lang="en"> <head> <title>Test HTML File</title> </head> <body> <h1>Hi, I am a simple HTML File.</h1> </body> </html> |
Another approach would be to use the fileinput
module as a context manager. This method is somewhat safe because it ensures that the fileinput
instance is closed even if an exception occurs.
1 2 3 4 5 |
import fileinput with fileinput.input(files=('test.txt', 'sample.txt')) as files: for data in files: print(data) |
In the above demonstration, the fileinput
module was used as a context manager with the 'with'
statement.
The above code will return an iterator and will assign it to the files
variable (due to the as
clause) then the data will be iterated using the files
variable.
1 2 |
Hi, I am a test file. Hi, I am a sample file for testing. |
The fileinput.input() Function
The fileinput.input()
function is the primary interface of the fileinput
module, by using it, the purpose of using the fileinput
module is nearly fulfilled. You saw a glimpse of the fileinput.input()
function in the previous section, this time, you’ll learn more about it.
Syntax
fileinput.input(files=None, inplace=False, backup='', mode='r', openhook=None, encoding=None, errors=None)
Parameters:
files
: Defaults to None
. Takes a single file or multiple files to be processed.
inplace
: Defaults to False
. When set to True
, the files can be modified directly.
backup
: Defaults to an empty string. The extension is specified for the backup files when inplace
is set to True
.
mode
: Default to read mode. This can only open files in read mode hence, we can open the file in r
, rb
, rU
, and U
.
openhook
: Defaults to None
. A custom function for controlling how files are opened.
encoding
: Defaults to None
. Specifies the encoding to be used to read the files.
errors
: Defaults to None
. Specifies how the errors should be handled.
Modifying the Files Before Reading
1 2 3 4 5 6 |
import fileinput with fileinput.input(files=('test.txt', 'sample.txt'), inplace=True) as files: for data in files: modified_content = data.lower() print(modified_content) |
The parameter inplace
is set to True
in the above code, which enables the editing of the file before reading.
The upper code will lowercase the content present inside both files (test.txt
and sample.txt
).
Storing Backup of Files
When the inplace
parameter is set to True
, the original files can be edited, but the original state of the files can be saved in another file using the backup
parameter.
1 2 3 4 5 6 7 |
import fileinput with fileinput.input(files=('test.txt', 'sample.txt'), inplace=True, backup='.bak') as files: for data in files: modified_content = data.capitalize() print(modified_content) |
The above code will capitalize the content and the original files will be saved as test.txt.bak
and sample.txt.bak
due to the backup='.bak'
.
Controlling the Opening of the File
1 2 3 4 5 6 7 8 9 10 |
import fileinput def custom_open(filename, mode): data = open(filename, "a+") data.write(" Data added through function.") return open(filename, mode) with fileinput.input(files=("test.txt", "sample.txt"), openhook=custom_open) as file: for data in file: print(data) |
The custom_open()
function is defined that takes two parameters filename
and mode
. The function opens the file in append + read
mode and then writes the string and returns the file object.
The hook must be a function that takes two arguments,
filename
andmode
, and returns an accordingly opened file-like object.Source
The files are then passed to the fileinput.input()
function, and the openhook
parameter is set to custom_open
. The custom_open()
function will be in charge of opening the files. The file content was iterated and printed.
1 2 |
Hi, i am a test file. Data added through function. Hi, i am a sample file for testing. Data added through function. |
Reading Unicode Characters
You have a file having Unicode characters and need to read that file, to read Unicode characters, specific encodings are used.
1 2 3 |
with fileinput.input(files=('test_unicode.txt'), encoding='utf-8') as files: for data in files: print(data) |
The UTF-8 encoding can be used to read the Unicode characters, hence, the encoding
parameter is set to utf-8
encoding.
1 |
πππ
|
Handling Errors
To handle the error, use the errors
parameter. Take the above code as an example: if the encoding
was not specified, the code would throw a UnicodeError
.
1 2 3 4 5 6 |
with fileinput.input(files=('test_bin.txt'), errors='ignore') as files: for data in files: print(data) ---------- Γ°ΕΈΛΓ°ΕΈΛβΓ°ΕΈΛβ¦ |
The errors
parameter is set to ignore
, which means that the error will be ignored. The errors
parameter can also be set to strict
(raise an exception if an error occurs) or replace
(replace an error with a specified error).
Functions to Access Input File Information
There are some functions that can be used to access the information of the input files which are being processed using the fileinput.input()
function.
Getting the File Names
Using the fileinput.filename()
function, the name of the currently processed files can be displayed.
1 2 3 4 |
with fileinput.input(files=('test.txt', 'sample.txt')) as files: for data in files: print(f"File: {fileinput.filename()}") print(data) |
Output
1 2 3 4 |
File: test.txt Hi, i am a test file. Data added through function. Added data to the file. File: sample.txt Hi, i am a sample file for testing. Data added through function. Added data to the file. |
Getting the File Descriptor and Line and File Line Number
The fileinput.fileno()
function returns the active file’s file descriptor, the fileinput.lineno()
function returns the cumulative line number, and the fileinput.filelineno()
function returns the line number of the currently processed file.
1 2 3 4 5 |
with fileinput.input(files=('test.txt', 'sample.txt')) as files: for data in files: print(f"{fileinput.filename()}'s File Descriptor: {fileinput.fileno()}") print(f"{fileinput.filename()}'s File Line Number: {fileinput.filelineno()}") print(f"{fileinput.filename()}'s File Cumulative Line No.: {fileinput.lineno()}") |
Output
1 2 3 4 5 6 7 |
test.txt's File Descriptor: 3 test.txt's File Line Number: 1 test.txt's File Cumulative Line No.: 1 sample.txt's File Descriptor: 3 sample.txt's File Line Number: 1 sample.txt's File Cumulative Line No.: 2 |
Checking Reading Status
1 2 3 4 5 6 7 8 9 10 |
with fileinput.input(files=('test.txt', 'sample.txt')) as files: for data in files: print(f"Read First Line: {fileinput.isfirstline()}") print(f"Last Line Read From sys.stdin: {fileinput.isstdin()}") ---------- Read First Line: True Last Line Read From sys.stdin: False Read First Line: True Last Line Read From sys.stdin: False |
The fileinput.isfirstline()
function returns True
if the line read from the current file is the first line otherwise returns False
, since both files contain a single line, it returned True
.
When the last line of the input file was read from sys.stdin
, the fileinput.isstdin()
function returns True
, otherwise, it returns False
.
Closing the File
When using fileinput.input()
function as the context manager with the with
statement, the file closes anyway but fileinput.close()
function is also used to close the resources when the work is done.
1 2 3 4 5 6 7 8 9 |
import fileinput with fileinput.input(files=('test.txt', 'sample.txt')) as file: for data in file: if data > data[:26]: fileinput.close() print('File has more than 25 characters.') else: print(data) |
The above code demonstrates the use of the fileinput.close()
function, which closes the file if it contains more than 25 characters and prints a message otherwise the content is printed.
1 |
File has more than 25 characters. |
However, because the file contained more than 25 characters, the file was closed and the message was printed.
The FileInput Class
The fileinput.FileInput
class is an object-oriented alternative to the fileinput.input()
function. The parameters are identical to those of the input()
function.
Syntax
fileinput.FileInput(files=None, inplace=False, backup='', mode='r', openhook=None, encoding=None, errors=None)
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import fileinput class OpenMultipleFiles: def __init__(self, *args): self.args = args def custom_open(self, filename, mode): data = open(filename, "a+") data.write(" Added data to the file.") return open(filename, mode) def read(self): with fileinput.FileInput(files=(self.args), openhook=OpenMultipleFiles().custom_open) as file: for data in file: print(data) obj = OpenMultipleFiles('test.txt', 'sample.txt') obj.read() |
The class OpenMultipleFiles
is defined in the above code. The class has an __init__
method that takes variadic arguments.
A custom_open
method is defined within the class that opens the file in append+read
mode, writes some data to the file, and returns the file object.
The read
method is defined and within the read
method the instance of the fileinput.FileInput
is created and passed the self.args
as the files argument and the openhook
parameter is set to OpenMultipleFiles().custom_open
. The contents of the files are then iterated and printed.
Finally, the OpenMultipleFiles
class instance is created and passed the file names (test.txt
and sample.txt
) and stored within the obj
variable. The read
method is then invoked on the obj
to read the specified files.
1 2 |
Hi, i am a test file. Data added through function. Added data to the file. Hi, i am a sample file for testing. Data added through function. Added data to the file. |
Comparison
Let’s see how long it takes to process the contents of multiple files at the same time using the open()
and the fileinput.input()
function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import timeit # open() Function Code code = ''' with open('test.txt') as f1, open('sample.txt') as f2: f1.read() f2.read() ''' print(f"Open Function Benchmark: {timeit.timeit(stmt=code, number=1000)}") # fileinput Code setup = 'import fileinput' code = ''' with fileinput.input(files=('test.txt', 'sample.txt')) as file: for data in file: data ''' print(f"Fileinput Benchmark: {timeit.timeit(setup=setup, stmt=code, number=1000)}") |
Using the timeit
module, the above code measures the time it takes to process the contents of multiple files 1000 times for the fileinput.input()
function and open()
function. This method will aid in determining which is more efficient.
1 2 |
Open Function Benchmark: 0.3948998999840114 Fileinput Benchmark: 0.4962893000047188 |
Limitations
Every module is powerful in its own right, but it also has limitations, such as the fileinput
module.
- It does not read files, instead, it iterates through the contents of the file line by line and prints the results.
- Cannot write or append the data into the files.
- Cannot perform advanced file-handling operations.
- Less performant because the program’s performance may suffer when processing large files.
Conclusion
The fileinput
module provides functions to process one or more than one file line by line to read the content. The fileinput.input()
function is the primary interface of the fileinput
module, and it provides parameters to give you more control over how the files are processed.
Let’s recall what you’ve learned:
- An overview of the
fileinput
module - Basic usage of the
fileinput.input()
with and without context manager - The
fileinput.input()
function and its parameters with examples - A glimpse of
FileInput
class - Comparison of
fileinput.input()
function withopen()
function for processing multiple files simultaneously - Some limitations of the
fileinput
module
πOther articles you might be interested in if you liked this one
β How to use assert statements for debugging in Python?
β Difference between the __init__ and __new__ methods.
β What is context manager and the ‘with’ statement in Python?
β How to implement getitem, setitem, and delitem in Python classes?
β How to perform unit testing using the unittest module in Python?
β File handling in Python – Opening, Reading, and much more.
β Public, Protected, and Private access modifiers in Python.
That’s all for now
Keep codingββ