external files in python

open, read, write files from python

Quick facts

  • The file operation takes place in the following order: open the file, perform operation (e.g., read or write), close the file

  • The os module allows user to change or delete files; os.rename(<current file name>, <new file name>) or os.remove(<file name>)

  • Python has a built-in function called open() to open files; the syntax is open(<file name>, <mode>)'

  • You want to assign a variable to store the contents of the file

  • When you call a file out by its name (vs. full directory path), the file must be located in the same folder as the program

  • It is important to close the file after performing operations on the file or else data won’t be saved; the syntax is <variable>.close()

    • When executing a file has an exception, the code will exit before closing the file; in this case, run a try/finally statement (same structure as try/except) where the open and file operations come after “try:” and <variable>.close() comes after “finally:”

  • Think of open and close as the header and footer

  • Alternatively, you can open files as with open(<file name>, <mode>) as <variable>: <next line + index>

    • No need to include a .close() function here because the file is within the scope of the function

  • The default mode when opening a file is reading in text mode which will return strings from the file

  • Different modes when opening a file:
    ▸ ‘r’ = read-only (default)
    ▸ ‘w’ = write
    ▸ ‘a’ = append
    ▸ ‘t’ = text mode
    ▸ ‘x’ = exclusive creation (if file already exists, operation will fail)
    ▸ ‘b’ = binary mode (returns bytes, dealing with non-text files like images or executable files)
    ▸ ‘r+’ = read and write (append)"
    ▸ ‘r+b’ = read and write in binary mode
    ▸ ‘wb’ = write file in binary mode
    ▸ ‘rb’ = read file in binary mode


File Types

  • Generally, there are two types of files in Python:

    1. Text

    • Web standards: html, XML, CSS, JSON etc.

    • Source code: c, app, js, py, java etc.

    • Documents: txt, tex, RTF etc.

    • Tabular data: csv, tsv etc.

    • Configuration: ini, cfg, reg etc.

    2. Binary

    • Document files: .pdf, .doc, .xls etc.

    • Image files: .png, .jpg, .gif, .bmp etc.

    • Video files: .mp4, .3gp, .mkv, .avi etc.

    • Audio files: .mp3, .wav, .mka, .aac etc.

    • Database files: .mdb, .accde, .frm, .sqlite etc.

    • Archive files: .zip, .rar, .iso, .7z etc.

    • Executable files: .exe, .dll, .class etc.

  • Pickle in Python is how you can convert a Python object into a byte stream to store it in a file or database and then call it back later

    • pickle file vs. importing a module is that the files save the state of what objects are; i.e., when you import a module, you’re starting from a clean slate and you have to run everything from scratch

    • pickle files are good for machine learning objects

  • File encoding involves converting characters into a specific format that only a machine can understand. By default, the following operating systems use these encoding formats by default:

    • Microsoft Windows OS — cp1252

    • Linux or Unix OS — utf-8

    • Apple’s MAC OS — utf-8 or utf-16

  • You can change the encoding during the open() function; syntax is <variable>.open(<file name>, mode = <mode>, encoding = <encoding format>)

  • You can find out the file’s attributes by using the print() function for <variable>.name for name, <variable>.mode for mode, <variable>.encoding for encoding, <variable>.closed for Boolean value


Reading Files

  • Check if the file is readable by running print(<variable>.readable()) which will return True or False

    • it will always False if file is opened in “w” (write) or “a” (append) mode, and you cannot use the .readline() functions

    • make sure the file is opened as “r” (read) or “r+” (read and write)

  • The output is a string; so new lines will show as ‘\n’

  • print(<variable>.read(<size>)) will return the contents from the file

    • if size is not specified, the code will return all the contents of the file

    • <variable>.read(4) will return the first four data (letters, characters, etc.)

  • print(<variable>.readline()) will return the first line of content from the file; if you run it again, it will return the next line

  • print(<variable>.readlines()) will return all the lines of content from the file in list format

    • you can assign to a variable and perform list functions after closing the file

    • you can pull a specific line by running print(<variable>.readlines()[<index>])

    • you can run a ‘for loop’ to print each line; e.g. for line in file_name.readlines(): <next line + indent> print(line)

  • <variable>.tell() will return the current file position

  • <variable>.seek(<offset>) will allow you to change the position

    • offset 0 - reference will be pointed at the beginning of the file

    • offset 1 - reference will be pointed at the current cursor position

    • offset 2 - reference will be pointed at the end of the file


Writing Files

  • Open the file in “a” (append) or “w” (write) modes to write in the files

    • if you get an error, try opening under the “wb” mode which is more universal; but when you try to this read this file again, use the “rb” mo

  • Check if the file is writable with the <variable>.writable() function which will return True or False

  • <variable>.write(“<text>”) is the code to write into the file

  • if the file was opened in “a” (append) or “r+” (read and write) mode, the new text will added to the end of the file

    • if you run the code again, the new text will be repeatedly added to the end of the file

  • if the file was opened in “w” mode, the new text will override all the contents in the file; previous content will be erased

    • if the file name doesn’t exist, this will create a new file in the current directory

  • you won’t see a return, but you can see the changes when you go back to the file

  • use special characters for string features like new lines ‘\n’

  • you can create webpages from Python by writing a new file with .html extension; you can include HTML code within the .write() method