Input and Output (I/O)#
Learning Objectives#
Understand the basic concepts of Input/Output (I/O) operations in Julia.
Perform file handling operations: opening, reading, writing, and closing files.
Potentially explore Julia packages for handling specific file formats efficiently
Overview of I/O#
Input/Output (I/O) operations involve reading data into the running Julia process from external sources (input) and writing data out from the process (output). Reading data from/writing data to files on disk is a typical example, which is essential for tasks like data processing, configuration, logging, etc. Julia provides built-in functions for basic text and binary file I/O and has packages for more complex formats (CSV, Excel, JSON, etc.)
General approach#
In Julia, interacting with files typically uses the open()
function along with read
and write
for data or higher-level convenience functions provided by packages.
The typical steps for file I/O in any language are:
Open the file (specifying a mode like read or write)
Read from or write to the file.
Close the file to ensure resources are freed, and data is flushed to disk.
In Julia, you can open a file using open("filepath", "r")
for reading, "w"
for writing (which truncates/creates the file), or "a"
for appending to the end. The result of open
is an IO stream object that you can use with read
, write
, etc. It’s important to close the file when done. Although Julia will close files when the object is garbage-collected, it’s best to do this as soon as we’re done with reading the file.
Working with text files#
Reading files#
Suppose we have a text file located at data/numbers.txt
containing a list of numbers separated by new lines:
1
-4.3
42
3.14
One way we could read this file is as follows:
# Open a file for reading
file = open("data/numbers.txt", "r")
# Read contents of file into a Vector of Strings
# (each Vector element is a line from the file)
numbers = readlines(file)
# Close the file
close(file)
println(numbers)
["1", "-4.3", "42", "3.14"]
This code opens the file numbers.txt
in a folder data
(relative path), reads the lines of the file into a Vector{String}
and then immediately closes it.
It is a very common pattern to (1) open a file, (2) do something with its contents, (3) close the file. Because of this, it is preferable to use a slightly different form of open
:
open("path/to/file", mode) do file # mode = "r", "w", "a", etc.
# Do something with the IO stream `file`
# and return the last value in this block
end
This version performs the following:
It opens the file and assigns the corresponding IO stream to
file
;evaluates the contents between
do file ... end
, just like an anonymous function withfile
as the argument; thenreturns the value of the last expression in the
do ... end
block and closes the file.
So our code above for reading in the numbers from data/numbers.txt
could equivalently be written as:
# Using open to read the lines
numbers = open("data/numbers.txt", "r") do file
readlines(file)
end
println(numbers)
["1", "-4.3", "42", "3.14"]
Note Using the
do
block form ofopen
is a safe pattern to ensure closure of the file even if an error occurs during reading/writing.
If the above code seems a bit too much like magic to you, read about the do ... end
syntax and also check out the Julia manual section on streaming files.
Writing files#
To write to a text file, we use the print
function together with open
(make sure to open the file in write "w"
or append mode "a"
as required!). Alternatively, we can use println
to put a newline character after the string. For example, to write the string "Hello, Julia!"
to a text file called greeting.txt
, we could run:
open("greeting.txt", "w") do file
print(file, "Hello") # no newline
println(file, ", Julia!") # with newline
end
Note that print
and println
also work with non-string data – they will write a textual representation of the object to the file. For example, we could have created our numbers.txt
file as follows:
open("data/numbers.txt", "w") do file
for x in [1, -4.3, 42, 3.14]
println(file, x)
end
end
Reading text files: helpers#
Julia provides some methods to help reading data from text files, that only require supplying the file path instead of using open
directly.
Read file to a single String
using the read
function:
# Read whole file as a string
contents = read("data/numbers.txt", String)
contents
"1\r\n-4.3\r\n42\r\n3.14"
Read lines of the file to a Vector{String}
using the readlines
function:
# Read lines to a vector of strings
lines = readlines("data/numbers.txt")
lines
4-element Vector{String}:
"1"
"-4.3"
"42"
"3.14"
Iterate through each line individually using the eachline
function:
Note that this approach is memory efficient for large files, because we only access the file one line at a time.
for line in eachline("data/numbers.txt")
println(line)
end
1
-4.3
42
3.14
Exercise: Reading and writing text files#
Create your own text file of numbers locally on your machine, one number per line. Write code to read in the numbers, sum them and write the result to a new text file. Hint: you may want to make use of the parse
function.
# Answer here
Reading and Writing Binary Files#
Reading binary data#
Sometimes you need to work with raw bytes or store structured numeric data compactly.
Recall that we could use the read
function to read a text file into a String
:
# Read data as Strings
contents = read("data/numbers.txt", String) # don't forget to specify String
contents
"1\r\n-4.3\r\n42\r\n3.14"
If we leave out the String
type in the function signature, we get something that on first glance looks rather different:
# Read data (not as Strings)
contents = read("data/numbers.txt")
contents
17-element Vector{UInt8}:
0x31
0x0d
0x0a
0x2d
0x34
0x2e
0x33
0x0d
0x0a
0x34
0x32
0x0d
0x0a
0x33
0x2e
0x31
0x34
In fact, this is a sequence of bytes that represents the contents of our file. Bytes can be expressed as unsigned 8-bit integers (UInt8
); note that the representation above uses literal syntax to represent a UInt8
, which uses hexadecimal notation:
0x05 = UInt8(5)
0x0a = UInt8(10)
0x0f = UInt(15)
0x34 = UInt8(3 * 16 + 4) = UInt8(52)
0xff = UInt8(255)
The key thing to remember is:
read(file)
: return the contents offile
as aVector
ofUInt8
(i.e. of bytes)read(file, String)
: return the contents offile
as a string.
We see here that read(file)
really just views the contents of the file as a stream of data – all context about what that data represents is lost. It’s up to us to ensure we know how to interpret that stream of data (which depends on what we’re trying to do).
Writing a Binary File#
The function write(file, content)
can be used to write a canonical byte representation of some object content
to the given file. Note that file
can either be a string giving a filepath or an IO stream created via open
. The function write
returns the number of bytes written to the file.
Technical note: endianness
Strictly speaking,
write
writes a byte-representation of some object that adheres to the endianness convention of your machine. In the case where we want to express some data that is represented by more than one byte (e.g. anInt32
requires 4 bytes), it’s necessary to specify which order, i.e. the endianness, in which the bytes are written in memory. See the Wikipedia entry on endianness for more details.
Below are two examples: one for writing a Vector{UInt8}
of arbitrary bytes, and another for writing a Vector{Int32}
in your machine’s native byte order.
write(io, bytes::Vector{UInt8})
emits each byte exactly as it appears in memory.write(io, numbers::Vector{Int32})
emits each 32-bit integer (in your machine’s native endianness).
# Prepare some sample data
bytes = UInt8[0x01, 0xFF, 0x10, 0x20] # raw bytes
numbers = Int32[1, -2, 3] # three 32-bit ints
# 1) Write raw bytes to raw.bin
open("data/raw.bin", "w") do file
write(file, bytes)
end
# 2) Write the Int32 array to ints.bin (native endianness)
open("data/ints.bin", "w") do file
write(file, numbers)
end
12
Note from the output that 12 bytes were written to data/ints.bin
for the Int32
array: can you see why?
Exercise: interpreting binary files#
This exercise walks you through the process of serializing and deserializing an array of Float64
numbers, using the binary I/O described above together with the reinterpret
function.
Create a 1D array of
Float64
and usewrite
to save this array to a file calledarray.bin
.The
reinterpret
function can be used to take a byte buffer (i.e. aVector{UInt8}
) and instead view it as an array with elements of a different type. (This is a zero-copy operation - the same memory is just reinterpreted.) For example, ifbytes = UInt8[0x1f, 0x85, 0xeb, 0x51, 0xb8, 0x1e, 0x09, 0x40]
then it just so happens that this sequence of 8 bytes can be interpreted as the length 1 vector
[3.14]
ofFloat64
:julia> bytes = UInt8[0x1f, 0x85, 0xeb, 0x51, 0xb8, 0x1e, 0x09, 0x40]; julia> reinterpret(Float64, bytes) 1-element Vector{Float64}: 3.14
Read in the data from
array.bin
as a vector of bytes and usereinterpret
to interpret it as a vector ofFloat64
. Assign the resulting vector to a variabledata
.Verify that the contents of
data
is equal to the contents of the original 1D array that you created in part 1.
# Answer here
Remember: closing files and resources#
Always remember to close files when done. Using the do
block form of open
is a safe pattern to ensure closure even if an error occurs during reading/writing. Similarly, the read
, readlines
, eachline
and write
functions, when given paths to files (rather than IO streams) will ensure closure in the face of errors.
Overview of Common File Formats#
Julia supports a variety of file formats for data storage and manipulation. Below is an overview of some commonly used file formats along with links to detailed resources for each format.
Delimited Files#
Delimited text files (CSV, TSV, etc.) can be handled by Julia’s built-in DelimitedFiles
standard library. It provides the readdlm
and writedlm
functions for reading and writing numeric and string arrays with customizable delimiters.
CSV Files#
CSV (Comma-Separated Values) files are one of the most widely used formats for storing tabular data. Julia provides excellent support for reading and writing CSV files using the CSV.jl
package.
JSON Files#
JSON (JavaScript Object Notation) is a lightweight data interchange format. Julia can handle JSON files using the JSON.jl
package.
HDF5 Files#
HDF5 (Hierarchical Data Format version 5) is designed to store large amounts of data. The HDF5.jl
package provides support for reading and writing HDF5 files in Julia.
Parquet Files#
Parquet is another columnar storage file format optimized for use with data analytics. The Parquet.jl
package allows for reading and writing Parquet files in Julia.
NetCDF Files#
NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats. The NCDatasets.jl
package is used for working with NetCDF files in Julia.
BSON Files#
BSON (Binary JSON) is a binary representation of JSON-like documents. Julia supports BSON files using the BSON.jl
package.
JLD2 Files#
JLD2.jl provides a native-Julia serialization format (built on a subset of HDF5) for saving and loading arbitrary Julia data structures—without the need for the external HDF5 C library.
By using the above links, you can explore more about each file format and learn how to effectively use them in Julia.
Exercise: Exploring Other File Formats#
This is an opportunity to explore a file format that aligns with the kinds of data you typically work with, or hope to work with, in your own projects.
Select a file format from the list above (e.g., CSV, JSON, HDF5, NetCDF, etc.). Choose one file format that you believe will be most relevant to your work beyond this course.
Visit the online documentation and read a bit about the package, such how to read and write the file format(s) that the package supports
(You may want to come back to this part once you’ve completed the episode on package management) Install the package and use it to:
Write a small example dataset to a file in that format.
Read the file back into Julia and print the contents to verify the data roundtrip worked.
Use the
do
block form ofopen
if applicable, to ensure the file is safely closed.
Some ideas:
If you’re working with tabular data, try
CSV.jl
.For structured configuration or data exchange,
JSON.jl
might be a good fit.If you deal with scientific datasets, explore
NetCDF
orHDF5
.