Input Output#

Learning Objectives#

  • Understand the basic concepts of Input/Output (IO) operations in Julia

  • Perform file handling operations, including opening, reading, writing and closing files

  • Read and write different file formats such as CSV, Excel, JSON and others

  • Use Julia packages for handling specific file formats efficiently

  • Implement data processing tasks using various file formats

Overview of I/O#

I/O (Input/Output) operations involve reading data from data streams and outputing data into streams, these typically involve writing data to files and reading data from files.

File Handling#

File handling can be used in a number of different circumstances including data processing, storage and retrival, alongside creating logs and configuration files.

Basic File Operations#

Opening and Closing Files#

The first arguement to the function open() gives the filepath for the file that is to be opened, with the second arguement outlining the permissions, in the case below, read denoted with r, but write access could be set with w.

# Open a file in read mode 
file = open("data/input_data.txt", "r")

# Close the file
close(file)

Ofcourse simply opening and closing the file is not a lot of use, and at the very least we will want to read the data in the file.

Reading from and Writing to Files#

Reading from a text file#

file = open("data/input_data.txt", "r")
content = read(file, String)
println(content)
close(file)
DATA APPEND DATA

Writing to a text file#

file = open("data/input_data.txt", "w")
write(file, "DATA")
close(file)

We can reprint the file to ensure that the additional data has been wrote to the file!

file = open("data/input_data.txt", "r")
content = read(file, String)
println(content)
close(file)
DATA

As you can see from the output the only content within the file is the content that we have put into the file. If we want to instead append data to the file, then we can make use of the arguement a rather than w.

# Open the file in append mode
file = open("data/input_data.txt", "a")

# Data to append
data_to_append = " APPEND DATA"

# Write the data to the file
write(file, data_to_append)

# Close the file
close(file)

# Re-open the file in read mode to verify the content
file = open("data/input_data.txt", "r")
content = read(file, String)
println(content)
close(file)
DATA APPEND DATA

Reading a csv file#

To work with csv files we will likely want to make use of the CSV.jl package, which can be installed as below:

using Pkg
Pkg.add("CSV")
   Resolving package versions...
  No Changes to `~/Documents/CfRR/Website_Build/CfRR_Courses/Project.toml`
  No Changes to `~/Documents/CfRR/Website_Build/CfRR_Courses/Manifest.toml`

Data from a .csv file can then be read in with a similar manner, which is a more likely scenario in a research setting, such as the output data from a weather station.

using CSV
data = CSV.File("data/exeter_university_weather.csv")
println(data)
CSV.File("data/exeter_university_weather.csv"):
Size: 5 x 6
Tables.Schema:
 :Date                 Dates.Date
 :Temperature_C        Float64
 Symbol("Humidity_%")  Int64
 :Wind_Speed_kmph      Int64
 :Precipitation_mm     Float64
 :Pressure_hPa         Int64

The format above gives us an overview of the data that is within the file, but if we want to perform some processing on the data, then the use of the dataframes package will be useful, which we can install as done below.

Pkg.add("DataFrames")
   Resolving package versions...
  No Changes to `~/Documents/CfRR/Website_Build/CfRR_Courses/Project.toml`
  No Changes to `~/Documents/CfRR/Website_Build/CfRR_Courses/Manifest.toml`
using DataFrames

# Read the CSV file into a DataFrame
df = CSV.read("data/exeter_university_weather.csv", DataFrame)

# Print the DataFrame
println(df)
5×6 DataFrame
 Row  Date        Temperature_C  Humidity_%  Wind_Speed_kmph  Precipitation_mm  Pressure_hPa 
     │ Date        Float64        Int64       Int64            Float64           Int64        
─────┼────────────────────────────────────────────────────────────────────────────────────────
   1 │ 2024-05-20           15.6          82               12               0.0          1012
   2 │ 2024-05-21           16.1          78               15               1.2          1010
   3 │ 2024-05-22           14.8          80               14               0.0          1011
   4 │ 2024-05-23           15.0          83               10               0.5          1013
   5 │ 2024-05-24           16.3          79               13               0.0          1012

Overview of Common File Formats#

Julia supports a variety of file formats for data storage and manipulation. Below is an overview of some commonly used file formats along with links to detailed resources for each format.

1. CSV Files#

CSV (Comma-Separated Values) files are one of the most widely used formats for storing tabular data. Julia provides excellent support for reading and writing CSV files using the CSV.jl package.

CSV Files in Julia

2. Excel Files#

Excel files (with extensions .xlsx or .xls) are commonly used for spreadsheet data. The ExcelFiles.jl and XLSX.jl packages allow for reading and writing Excel files in Julia.

Excel Files in Julia

3. JSON Files#

JSON (JavaScript Object Notation) is a lightweight data interchange format. Julia can handle JSON files using the JSON.jl package.

JSON Files in Julia

4. HDF5 Files#

HDF5 (Hierarchical Data Format version 5) is designed to store large amounts of data. The HDF5.jl package provides support for reading and writing HDF5 files in Julia.

HDF5 Files in Julia

5. Feather Files#

Feather is a binary columnar data format optimized for use with data frames. Julia can read and write Feather files using the Feather.jl package.

Feather Files in Julia

6. Parquet Files#

Parquet is another columnar storage file format optimized for use with data analytics. The Parquet.jl package allows for reading and writing Parquet files in Julia.

Parquet Files in Julia

7. NetCDF Files#

NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats. The NCDatasets.jl package is used for working with NetCDF files in Julia.

NetCDF Files in Julia

8. BSON Files#

BSON (Binary JSON) is a binary representation of JSON-like documents. Julia supports BSON files using the BSON.jl package.

BSON Files in Julia

By using the above links, you can explore more about each file format and learn how to effectively use them in Julia.