Input and Output (I/O)#

Learning Objectives#

  • Understand the basic concepts of Input/Output (I/O) operations in Julia.

  • Perform file handling operations: opening, reading, writing, and closing files.

  • Potentially explore Julia packages for handling specific file formats efficiently

Overview of I/O#

Input/Output (I/O) operations involve reading data into the running Julia process from external sources (input) and writing data out from the process (output). Reading data from/writing data to files on disk is a typical example, which is essential for tasks like data processing, configuration, logging, etc. Julia provides built-in functions for basic text and binary file I/O and has packages for more complex formats (CSV, Excel, JSON, etc.)

General approach#

In Julia, interacting with files typically uses the open() function along with read and write for data or higher-level convenience functions provided by packages.

The typical steps for file I/O in any language are:

  • Open the file (specifying a mode like read or write)

  • Read from or write to the file.

  • Close the file to ensure resources are freed, and data is flushed to disk.

In Julia, you can open a file using open("filepath", "r") for reading, "w" for writing (which truncates/creates the file), or "a" for appending to the end. The result of open is an IO stream object that you can use with read, write, etc. It’s important to close the file when done. Although Julia will close files when the object is garbage-collected, it’s best to do this as soon as we’re done with reading the file.

Working with text files#

Reading files#

Suppose we have a text file located at data/numbers.txt containing a list of numbers separated by new lines:

1
-4.3
42
3.14

One way we could read this file is as follows:

# Open a file for reading
file = open("data/numbers.txt", "r")

# Read contents of file into a Vector of Strings
# (each Vector element is a line from the file)
numbers = readlines(file)

# Close the file
close(file)

println(numbers)
["1", "-4.3", "42", "3.14"]

This code opens the file numbers.txt in a folder data (relative path), reads the lines of the file into a Vector{String} and then immediately closes it.

It is a very common pattern to (1) open a file, (2) do something with its contents, (3) close the file. Because of this, it is preferable to use a slightly different form of open:

open("path/to/file", mode) do file  # mode = "r", "w", "a", etc.
    # Do something with the IO stream `file`
    # and return the last value in this block
end

This version performs the following:

  1. It opens the file and assigns the corresponding IO stream to file;

  2. evaluates the contents between do file ... end, just like an anonymous function with file as the argument; then

  3. returns the value of the last expression in the do ... end block and closes the file.

So our code above for reading in the numbers from data/numbers.txt could equivalently be written as:

# Using open to read the lines
numbers = open("data/numbers.txt", "r") do file
    readlines(file)
end
println(numbers)
["1", "-4.3", "42", "3.14"]

Note Using the do block form of open is a safe pattern to ensure closure of the file even if an error occurs during reading/writing.

If the above code seems a bit too much like magic to you, read about the do ... end syntax and also check out the Julia manual section on streaming files.

Writing files#

To write to a text file, we use the print function together with open (make sure to open the file in write "w" or append mode "a" as required!). Alternatively, we can use println to put a newline character after the string. For example, to write the string "Hello, Julia!" to a text file called greeting.txt, we could run:

open("greeting.txt", "w") do file
    print(file, "Hello")  # no newline
    println(file, ", Julia!")  # with newline
end

Note that print and println also work with non-string data – they will write a textual representation of the object to the file. For example, we could have created our numbers.txt file as follows:

open("data/numbers.txt", "w") do file
    for x in [1, -4.3, 42, 3.14]
        println(file, x)
    end
end

Reading text files: helpers#

Julia provides some methods to help reading data from text files, that only require supplying the file path instead of using open directly.

Read file to a single String using the read function:

# Read whole file as a string
contents = read("data/numbers.txt", String)
contents
"1\r\n-4.3\r\n42\r\n3.14"

Read lines of the file to a Vector{String} using the readlines function:

# Read lines to a vector of strings
lines = readlines("data/numbers.txt")
lines
4-element Vector{String}:
 "1"
 "-4.3"
 "42"
 "3.14"

Iterate through each line individually using the eachline function:

Note that this approach is memory efficient for large files, because we only access the file one line at a time.

for line in eachline("data/numbers.txt")
    println(line)
end
1
-4.3
42
3.14

Exercise: Reading and writing text files#

Create your own text file of numbers locally on your machine, one number per line. Write code to read in the numbers, sum them and write the result to a new text file. Hint: you may want to make use of the parse function.

# Answer here

Reading and Writing Binary Files#

Reading binary data#

Sometimes you need to work with raw bytes or store structured numeric data compactly.

Recall that we could use the read function to read a text file into a String:

# Read data as Strings
contents = read("data/numbers.txt", String)  # don't forget to specify String
contents
"1\r\n-4.3\r\n42\r\n3.14"

If we leave out the String type in the function signature, we get something that on first glance looks rather different:

# Read data (not as Strings)
contents = read("data/numbers.txt")
contents
17-element Vector{UInt8}:
 0x31
 0x0d
 0x0a
 0x2d
 0x34
 0x2e
 0x33
 0x0d
 0x0a
 0x34
 0x32
 0x0d
 0x0a
 0x33
 0x2e
 0x31
 0x34

In fact, this is a sequence of bytes that represents the contents of our file. Bytes can be expressed as unsigned 8-bit integers (UInt8); note that the representation above uses literal syntax to represent a UInt8, which uses hexadecimal notation:

0x05 = UInt8(5)
0x0a = UInt8(10)
0x0f = UInt(15)
0x34 = UInt8(3 * 16 + 4) = UInt8(52)
0xff = UInt8(255)

The key thing to remember is:

  • read(file): return the contents of file as a Vector of UInt8 (i.e. of bytes)

  • read(file, String): return the contents of file as a string.

We see here that read(file) really just views the contents of the file as a stream of data – all context about what that data represents is lost. It’s up to us to ensure we know how to interpret that stream of data (which depends on what we’re trying to do).

Writing a Binary File#

The function write(file, content) can be used to write a canonical byte representation of some object content to the given file. Note that file can either be a string giving a filepath or an IO stream created via open. The function write returns the number of bytes written to the file.

Technical note: endianness

Strictly speaking, write writes a byte-representation of some object that adheres to the endianness convention of your machine. In the case where we want to express some data that is represented by more than one byte (e.g. an Int32 requires 4 bytes), it’s necessary to specify which order, i.e. the endianness, in which the bytes are written in memory. See the Wikipedia entry on endianness for more details.

Below are two examples: one for writing a Vector{UInt8} of arbitrary bytes, and another for writing a Vector{Int32} in your machine’s native byte order.

  • write(io, bytes::Vector{UInt8}) emits each byte exactly as it appears in memory.

  • write(io, numbers::Vector{Int32}) emits each 32-bit integer (in your machine’s native endianness).

# Prepare some sample data
bytes   = UInt8[0x01, 0xFF, 0x10, 0x20]    # raw bytes
numbers = Int32[1, -2, 3]                  # three 32-bit ints

# 1) Write raw bytes to raw.bin
open("data/raw.bin", "w") do file
    write(file, bytes)
end

# 2) Write the Int32 array to ints.bin (native endianness)
open("data/ints.bin", "w") do file
    write(file, numbers)
end
12

Note from the output that 12 bytes were written to data/ints.bin for the Int32 array: can you see why?

Exercise: interpreting binary files#

This exercise walks you through the process of serializing and deserializing an array of Float64 numbers, using the binary I/O described above together with the reinterpret function.

  1. Create a 1D array of Float64 and use write to save this array to a file called array.bin.

  2. The reinterpret function can be used to take a byte buffer (i.e. a Vector{UInt8}) and instead view it as an array with elements of a different type. (This is a zero-copy operation - the same memory is just reinterpreted.) For example, if

    bytes = UInt8[0x1f, 0x85, 0xeb, 0x51, 0xb8, 0x1e, 0x09, 0x40]
    

    then it just so happens that this sequence of 8 bytes can be interpreted as the length 1 vector [3.14] of Float64:

    julia> bytes = UInt8[0x1f, 0x85, 0xeb, 0x51, 0xb8, 0x1e, 0x09, 0x40];
    julia> reinterpret(Float64, bytes)
    1-element Vector{Float64}:
     3.14
    

    Read in the data from array.bin as a vector of bytes and use reinterpret to interpret it as a vector of Float64. Assign the resulting vector to a variable data.

  3. Verify that the contents of data is equal to the contents of the original 1D array that you created in part 1.

# Answer here

Remember: closing files and resources#

Always remember to close files when done. Using the do block form of open is a safe pattern to ensure closure even if an error occurs during reading/writing. Similarly, the read, readlines, eachline and write functions, when given paths to files (rather than IO streams) will ensure closure in the face of errors.

Overview of Common File Formats#

Julia supports a variety of file formats for data storage and manipulation. Below is an overview of some commonly used file formats along with links to detailed resources for each format.

Delimited Files#

Delimited text files (CSV, TSV, etc.) can be handled by Julia’s built-in DelimitedFiles standard library. It provides the readdlm and writedlm functions for reading and writing numeric and string arrays with customizable delimiters.

Delimited Files in Julia

CSV Files#

CSV (Comma-Separated Values) files are one of the most widely used formats for storing tabular data. Julia provides excellent support for reading and writing CSV files using the CSV.jl package.

CSV Files in Julia

JSON Files#

JSON (JavaScript Object Notation) is a lightweight data interchange format. Julia can handle JSON files using the JSON.jl package.

JSON Files in Julia

HDF5 Files#

HDF5 (Hierarchical Data Format version 5) is designed to store large amounts of data. The HDF5.jl package provides support for reading and writing HDF5 files in Julia.

HDF5 Files in Julia

Parquet Files#

Parquet is another columnar storage file format optimized for use with data analytics. The Parquet.jl package allows for reading and writing Parquet files in Julia.

Parquet Files in Julia

NetCDF Files#

NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats. The NCDatasets.jl package is used for working with NetCDF files in Julia.

NetCDF Files in Julia

BSON Files#

BSON (Binary JSON) is a binary representation of JSON-like documents. Julia supports BSON files using the BSON.jl package.

BSON Files in Julia

JLD2 Files#

JLD2.jl provides a native-Julia serialization format (built on a subset of HDF5) for saving and loading arbitrary Julia data structures—without the need for the external HDF5 C library.

By using the above links, you can explore more about each file format and learn how to effectively use them in Julia.

Exercise: Exploring Other File Formats#

This is an opportunity to explore a file format that aligns with the kinds of data you typically work with, or hope to work with, in your own projects.

  1. Select a file format from the list above (e.g., CSV, JSON, HDF5, NetCDF, etc.). Choose one file format that you believe will be most relevant to your work beyond this course.

  2. Visit the online documentation and read a bit about the package, such how to read and write the file format(s) that the package supports

  3. (You may want to come back to this part once you’ve completed the episode on package management) Install the package and use it to:

    • Write a small example dataset to a file in that format.

    • Read the file back into Julia and print the contents to verify the data roundtrip worked.

    Use the do block form of open if applicable, to ensure the file is safely closed.

Some ideas:

  • If you’re working with tabular data, try CSV.jl.

  • For structured configuration or data exchange, JSON.jl might be a good fit.

  • If you deal with scientific datasets, explore NetCDF or HDF5.

End of Section Quiz#

Which of the following is the most memory-efficient way to read a very large text file line-by-line in Julia?

In Julia, what is the recommended way to ensure a file is always closed after reading or writing, even if an error occurs during processing?