Input Output#
Learning Objectives#
Understand the basic concepts of Input/Output (IO) operations in Julia
Perform file handling operations, including opening, reading, writing and closing files
Read and write different file formats such as CSV, Excel, JSON and others
Use Julia packages for handling specific file formats efficiently
Implement data processing tasks using various file formats
Overview of I/O#
I/O (Input/Output) operations involve reading data from data streams and outputting data into streams, these typically involve writing data to files and reading data from files.
File Handling#
File handling can be used in a number of different circumstances including data processing, storage and retrieval, alongside creating logs and configuration files.
Basic File Operations#
Opening and Closing Files#
The first argument to the function open()
gives the filepath for the file that is to be opened, with the second argument outlining the permissions, in the case below, read denoted with r
, but write access could be set with w
.
# Open a file in read mode
file = open("data/input_data.txt", "r")
# Close the file
close(file)
Of-course simply opening and closing the file is not a lot of use, and at the very least we will want to read the data in the file.
Reading from and Writing to Files#
Reading from a text file#
file = open("data/input_data.txt", "r")
content = read(file, String)
println(content)
close(file)
DATA
Writing to a text file#
file = open("data/input_data.txt", "w")
write(file, "DATA")
close(file)
We can reprint the file to ensure that the additional data has been wrote to the file!
file = open("data/input_data.txt", "r")
content = read(file, String)
println(content)
close(file)
DATA
As you can see from the output the only content within the file is the content that we have put into the file. If we want to instead append data to the file, then we can make use of the argument a
rather than w
.
# Open the file in append mode
file = open("data/input_data.txt", "a")
# Data to append
data_to_append = " APPEND DATA"
# Write the data to the file
write(file, data_to_append)
# Close the file
close(file)
# Re-open the file in read mode to verify the content
file = open("data/input_data.txt", "r")
content = read(file, String)
println(content)
close(file)
DATAAPPEND DATA APPEND DATA
Reading a csv file#
To work with csv files we will likely want to make use of the CSV.jl package, which can be installed as below:
using Pkg
Pkg.add("CSV")
Resolving package versions...
No Changes to `~/Documents/CfRR/CfRR_Courses/Project.toml`
No Changes to `~/Documents/CfRR/CfRR_Courses/Manifest.toml`
Data from a .csv file can then be read in with a similar manner, which is a more likely scenario in a research setting, such as the output data from a weather station.
using CSV
data = CSV.File("data/exeter_university_weather.csv")
println(data)
CSV.File("data/exeter_university_weather.csv"):
Size: 5 x 6
Tables.Schema:
:Date Dates.Date
:Temperature_C Float64
Symbol("Humidity_%") Int64
:Wind_Speed_kmph Int64
:Precipitation_mm Float64
:Pressure_hPa Int64
The format above gives us an overview of the data that is within the file, but if we want to perform some processing on the data, then the use of the dataframes package will be useful, which we can install as done below.
Pkg.add("DataFrames")
Resolving package versions...
No Changes to `~/Documents/CfRR/CfRR_Courses/Project.toml`
No Changes to `~/Documents/CfRR/CfRR_Courses/Manifest.toml`
using DataFrames
# Read the CSV file into a DataFrame
df = CSV.read("data/exeter_university_weather.csv", DataFrame)
# Print the DataFrame
println(df)
5×6 DataFrame
Row │ Date Temperature_C Humidity_% Wind_Speed_kmph Precipitation_mm Pressure_hPa
│ Date Float64 Int64 Int64 Float64 Int64
─────┼────────────────────────────────────────────────────────────────────────────────────────
1 │ 2024-05-20 15.6 82 12 0.0 1012
2 │ 2024-05-21 16.1 78 15 1.2 1010
3 │ 2024-05-22 14.8 80 14 0.0 1011
4 │ 2024-05-23 15.0 83 10 0.5 1013
5 │ 2024-05-24 16.3 79 13 0.0 1012
KERNEL EXCEPTION
KeyError: key "debug_request" not found
Stacktrace:
[1] getindex(h::Dict{String, Function}, key::String)
@ Base ./dict.jl:498
[2] eventloop(socket::ZMQ.Socket)
@ IJulia ~/.julia/packages/IJulia/Vo51o/src/eventloop.jl:8
[3] (::IJulia.var"#14#17")()
@ IJulia ~/.julia/packages/IJulia/Vo51o/src/eventloop.jl:37
KERNEL EXCEPTION
KeyError: key "debug_request" not found
Stacktrace:
[1] getindex(h::Dict{String, Function}, key::String)
@ Base ./dict.jl:498
[2] eventloop(socket::ZMQ.Socket)
@ IJulia ~/.julia/packages/IJulia/Vo51o/src/eventloop.jl:8
[3] (::IJulia.var"#14#17")()
@ IJulia ~/.julia/packages/IJulia/Vo51o/src/eventloop.jl:37
KERNEL EXCEPTION
KeyError: key "debug_request" not found
Stacktrace:
[1] getindex(h::Dict{String, Function}, key::String)
@ Base ./dict.jl:498
[2] eventloop(socket::ZMQ.Socket)
@ IJulia ~/.julia/packages/IJulia/Vo51o/src/eventloop.jl:8
[3] (::IJulia.var"#14#17")()
@ IJulia ~/.julia/packages/IJulia/Vo51o/src/eventloop.jl:37
KERNEL EXCEPTION
KeyError: key "debug_request" not found
Stacktrace:
[1] getindex(h::Dict{String, Function}, key::String)
@ Base ./dict.jl:498
[2] eventloop(socket::ZMQ.Socket)
@ IJulia ~/.julia/packages/IJulia/Vo51o/src/eventloop.jl:8
[3] (::IJulia.var"#14#17")()
@ IJulia ~/.julia/packages/IJulia/Vo51o/src/eventloop.jl:37
Overview of Common File Formats#
Julia supports a variety of file formats for data storage and manipulation. Below is an overview of some commonly used file formats along with links to detailed resources for each format.
1. CSV Files#
CSV (Comma-Separated Values) files are one of the most widely used formats for storing tabular data. Julia provides excellent support for reading and writing CSV files using the CSV.jl
package.
2. Excel Files#
Excel files (with extensions .xlsx
or .xls
) are commonly used for spreadsheet data. The ExcelFiles.jl
and XLSX.jl
packages allow for reading and writing Excel files in Julia.
3. JSON Files#
JSON (JavaScript Object Notation) is a lightweight data interchange format. Julia can handle JSON files using the JSON.jl
package.
4. HDF5 Files#
HDF5 (Hierarchical Data Format version 5) is designed to store large amounts of data. The HDF5.jl
package provides support for reading and writing HDF5 files in Julia.
5. Feather Files#
Feather is a binary columnar data format optimized for use with data frames. Julia can read and write Feather files using the Feather.jl
package.
6. Parquet Files#
Parquet is another columnar storage file format optimized for use with data analytics. The Parquet.jl
package allows for reading and writing Parquet files in Julia.
7. NetCDF Files#
NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats. The NCDatasets.jl
package is used for working with NetCDF files in Julia.
8. BSON Files#
BSON (Binary JSON) is a binary representation of JSON-like documents. Julia supports BSON files using the BSON.jl
package.
By using the above links, you can explore more about each file format and learn how to effectively use them in Julia.