external files in python
open, read, write files from python
Quick facts
The file operation takes place in the following order: open the file, perform operation (e.g., read or write), close the file
The os module allows user to change or delete files; os.rename(<current file name>, <new file name>) or os.remove(<file name>)
Python has a built-in function called open() to open files; the syntax is open(<file name>, <mode>)'
You want to assign a variable to store the contents of the file
When you call a file out by its name (vs. full directory path), the file must be located in the same folder as the program
It is important to close the file after performing operations on the file or else data won’t be saved; the syntax is <variable>.close()
When executing a file has an exception, the code will exit before closing the file; in this case, run a try/finally statement (same structure as try/except) where the open and file operations come after “try:” and <variable>.close() comes after “finally:”
Think of open and close as the header and footer
Alternatively, you can open files as with open(<file name>, <mode>) as <variable>: <next line + index>
No need to include a .close() function here because the file is within the scope of the function
The default mode when opening a file is reading in text mode which will return strings from the file
Different modes when opening a file:
▸ ‘r’ = read-only (default)
▸ ‘w’ = write
▸ ‘a’ = append
▸ ‘t’ = text mode
▸ ‘x’ = exclusive creation (if file already exists, operation will fail)
▸ ‘b’ = binary mode (returns bytes, dealing with non-text files like images or executable files)
▸ ‘r+’ = read and write (append)"
▸ ‘r+b’ = read and write in binary mode
▸ ‘wb’ = write file in binary mode
▸ ‘rb’ = read file in binary mode
File Types
Generally, there are two types of files in Python:
1. Text
Web standards: html, XML, CSS, JSON etc.
Source code: c, app, js, py, java etc.
Documents: txt, tex, RTF etc.
Tabular data: csv, tsv etc.
Configuration: ini, cfg, reg etc.
2. Binary
Document files: .pdf, .doc, .xls etc.
Image files: .png, .jpg, .gif, .bmp etc.
Video files: .mp4, .3gp, .mkv, .avi etc.
Audio files: .mp3, .wav, .mka, .aac etc.
Database files: .mdb, .accde, .frm, .sqlite etc.
Archive files: .zip, .rar, .iso, .7z etc.
Executable files: .exe, .dll, .class etc.
Pickle in Python is how you can convert a Python object into a byte stream to store it in a file or database and then call it back later
pickle file vs. importing a module is that the files save the state of what objects are; i.e., when you import a module, you’re starting from a clean slate and you have to run everything from scratch
pickle files are good for machine learning objects
File encoding involves converting characters into a specific format that only a machine can understand. By default, the following operating systems use these encoding formats by default:
Microsoft Windows OS — cp1252
Linux or Unix OS — utf-8
Apple’s MAC OS — utf-8 or utf-16
You can change the encoding during the open() function; syntax is <variable>.open(<file name>, mode = <mode>, encoding = <encoding format>)
You can find out the file’s attributes by using the print() function for <variable>.name for name, <variable>.mode for mode, <variable>.encoding for encoding, <variable>.closed for Boolean value
Reading Files
Check if the file is readable by running print(<variable>.readable()) which will return True or False
it will always False if file is opened in “w” (write) or “a” (append) mode, and you cannot use the .readline() functions
make sure the file is opened as “r” (read) or “r+” (read and write)
The output is a string; so new lines will show as ‘\n’
print(<variable>.read(<size>)) will return the contents from the file
if size is not specified, the code will return all the contents of the file
<variable>.read(4) will return the first four data (letters, characters, etc.)
print(<variable>.readline()) will return the first line of content from the file; if you run it again, it will return the next line
print(<variable>.readlines()) will return all the lines of content from the file in list format
you can assign to a variable and perform list functions after closing the file
you can pull a specific line by running print(<variable>.readlines()[<index>])
you can run a ‘for loop’ to print each line; e.g. for line in file_name.readlines(): <next line + indent> print(line)
<variable>.tell() will return the current file position
<variable>.seek(<offset>) will allow you to change the position
offset 0 - reference will be pointed at the beginning of the file
offset 1 - reference will be pointed at the current cursor position
offset 2 - reference will be pointed at the end of the file
Writing Files
Open the file in “a” (append) or “w” (write) modes to write in the files
if you get an error, try opening under the “wb” mode which is more universal; but when you try to this read this file again, use the “rb” mo
Check if the file is writable with the <variable>.writable() function which will return True or False
<variable>.write(“<text>”) is the code to write into the file
if the file was opened in “a” (append) or “r+” (read and write) mode, the new text will added to the end of the file
if you run the code again, the new text will be repeatedly added to the end of the file
if the file was opened in “w” mode, the new text will override all the contents in the file; previous content will be erased
if the file name doesn’t exist, this will create a new file in the current directory
you won’t see a return, but you can see the changes when you go back to the file
use special characters for string features like new lines ‘\n’
you can create webpages from Python by writing a new file with .html extension; you can include HTML code within the .write() method