You are currently viewing How to Read Multiple Files Simultaneously With fileinput Module In Python

How to Read Multiple Files Simultaneously With fileinput Module In Python

The fileinput module is a part of the standard library and is used when someone needs to iterate the contents of multiple files simultaneously. Well, Python’s in-built open() function can also be used for iterating the content but for only one file at a time.

You’ll explore the classes and functions provided by the fileinput module to iterate over multiple files.

But one thing, you could use fileinput to iterate the single file also, but it would be better to use the open() function for it.

Basic Usage

The fileinput module was first imported, and then the fileinput instance was created by calling fileinput.input() and passing the tuple of files (test.txtsample.txt, and index.html). This will result in the return of an iterator.

The contents of the files were then iterated and printed using the for loop.

Another approach would be to use the fileinput module as a context manager. This method is somewhat safe because it ensures that the fileinput instance is closed even if an exception occurs.

In the above demonstration, the fileinput module was used as a context manager with the 'with' statement.

The above code will return an iterator and will assign it to the files variable (due to the as clause) then the data will be iterated using the files variable.

The fileinput.input() Function

The fileinput.input() function is the primary interface of the fileinput module, by using it, the purpose of using the fileinput module is nearly fulfilled. You saw a glimpse of the fileinput.input() function in the previous section, this time, you’ll learn more about it.

Syntax

fileinput.input(files=None, inplace=False, backup='', mode='r', openhook=None, encoding=None, errors=None)

Parameters:

files: Defaults to None. Takes a single file or multiple files to be processed.

inplace: Defaults to False. When set to True, the files can be modified directly.

backup: Defaults to an empty string. The extension is specified for the backup files when inplace is set to True.

mode: Default to read mode. This can only open files in read mode hence, we can open the file in rrbrU, and U.

openhook: Defaults to None. A custom function for controlling how files are opened.

encoding: Defaults to None. Specifies the encoding to be used to read the files.

errors: Defaults to None. Specifies how the errors should be handled.

Modifying the Files Before Reading

The parameter inplace is set to True in the above code, which enables the editing of the file before reading.

The upper code will lowercase the content present inside both files (test.txt and sample.txt).

Storing Backup of Files

When the inplace parameter is set to True, the original files can be edited, but the original state of the files can be saved in another file using the backup parameter.

The above code will capitalize the content and the original files will be saved as test.txt.bak and sample.txt.bak due to the backup='.bak'.

Controlling the Opening of the File

The custom_open() function is defined that takes two parameters filename and mode. The function opens the file in append + read mode and then writes the string and returns the file object.

The hook must be a function that takes two arguments, filename and mode, and returns an accordingly opened file-like object.Source

The files are then passed to the fileinput.input() function, and the openhook parameter is set to custom_open. The custom_open() function will be in charge of opening the files. The file content was iterated and printed.

Reading Unicode Characters

You have a file having Unicode characters and need to read that file, to read Unicode characters, specific encodings are used.

The UTF-8 encoding can be used to read the Unicode characters, hence, the encoding parameter is set to utf-8 encoding.

Handling Errors

To handle the error, use the errors parameter. Take the above code as an example: if the encoding was not specified, the code would throw a UnicodeError.

The errors parameter is set to ignore, which means that the error will be ignored. The errors parameter can also be set to strict (raise an exception if an error occurs) or replace (replace an error with a specified error).

Functions to Access Input File Information

There are some functions that can be used to access the information of the input files which are being processed using the fileinput.input() function.

Getting the File Names

Using the fileinput.filename() function, the name of the currently processed files can be displayed.

Output

Getting the File Descriptor and Line and File Line Number

The fileinput.fileno() function returns the active file’s file descriptor, the fileinput.lineno() function returns the cumulative line number, and the fileinput.filelineno() function returns the line number of the currently processed file.

Output

Checking Reading Status

The fileinput.isfirstline() function returns True if the line read from the current file is the first line otherwise returns False, since both files contain a single line, it returned True.

When the last line of the input file was read from sys.stdin, the fileinput.isstdin() function returns True, otherwise, it returns False.

Closing the File

When using fileinput.input() function as the context manager with the with statement, the file closes anyway but fileinput.close() function is also used to close the resources when the work is done.

The above code demonstrates the use of the fileinput.close() function, which closes the file if it contains more than 25 characters and prints a message otherwise the content is printed.

However, because the file contained more than 25 characters, the file was closed and the message was printed.

The FileInput Class

The fileinput.FileInput class is an object-oriented alternative to the fileinput.input() function. The parameters are identical to those of the input() function.

Syntax

fileinput.FileInput(files=None, inplace=False, backup='', mode='r', openhook=None, encoding=None, errors=None)

Example

The class OpenMultipleFiles is defined in the above code. The class has an __init__ method that takes variadic arguments.

custom_open method is defined within the class that opens the file in append+read mode, writes some data to the file, and returns the file object.

The read method is defined and within the read method the instance of the fileinput.FileInput is created and passed the self.args as the files argument and the openhook parameter is set to OpenMultipleFiles().custom_open. The contents of the files are then iterated and printed.

Finally, the OpenMultipleFiles class instance is created and passed the file names (test.txt and sample.txt) and stored within the obj variable. The read method is then invoked on the obj to read the specified files.

Comparison

Let’s see how long it takes to process the contents of multiple files at the same time using the open() and the fileinput.input() function.

Using the timeit module, the above code measures the time it takes to process the contents of multiple files 1000 times for the fileinput.input() function and open() function. This method will aid in determining which is more efficient.

Limitations

Every module is powerful in its own right, but it also has limitations, such as the fileinput module.

  • It does not read files, instead, it iterates through the contents of the file line by line and prints the results.
  • Cannot write or append the data into the files.
  • Cannot perform advanced file-handling operations.
  • Less performant because the program’s performance may suffer when processing large files.

Conclusion

The fileinput module provides functions to process one or more than one file line by line to read the content. The fileinput.input() function is the primary interface of the fileinput module, and it provides parameters to give you more control over how the files are processed.

Let’s recall what you’ve learned:

  • An overview of the fileinput module
  • Basic usage of the fileinput.input() with and without context manager
  • The fileinput.input() function and its parameters with examples
  • A glimpse of FileInput class
  • Comparison of fileinput.input() function with open() function for processing multiple files simultaneously
  • Some limitations of the fileinput module

πŸ†Other articles you might be interested in if you liked this one

βœ…How to use assert statements for debugging in Python?

βœ…Difference between the __init__ and __new__ methods.

βœ…What is context manager and the ‘with’ statement in Python?

βœ…How to implement getitem, setitem, and delitem in Python classes?

βœ…How to perform unit testing using the unittest module in Python?

βœ…File handling in Python – Opening, Reading, and much more.

βœ…Public, Protected, and Private access modifiers in Python.


That’s all for now

Keep coding✌✌