NumPy#
Learning Objectives#
Understand the fundamental concepts and benefits of using NumPy for scientific computing
Create and initialize NumPy arrays from various data sources, including lists and files
Perform file operations such as saving and loading NumPy arrays to and from different file formats
Manipulate and analyze data using array attributes and methods
Execute arithmetic operations, including matrix and statistical operation, on NumPy arrays
Apply advanced techniques such as array slicing, indexing, and broadcasting to efficiently handle large datasets
What is NumPy?#
NumPy is a key Python package for scientific computing. NumPy provides support for large, multi-dimensional arrays and matrices, alongside mathematical functions to operate on these arrays, with applications across numerical analysis, linear algebra, Fourier transform and random number capabilities.
Importing NumPy#
The first step is to import the package into the current environment.
import numpy as np
Understanding Data Types and Arrays#
Array Creation#
The main data structure within NumPy is an array, providing efficient storage and manipulation for large numerical datasets.
From a list#
A simple way of creating a NumPy array is by passing a list.
array_from_list = np.array([1,2,3,4,5])
display(array_from_list)
array([1, 2, 3, 4, 5])
Zeros and Ones#
However there are also a range of different ways to create arrays with the same value such as having a NumPy array of ones (np.ones
), array of zeros (np.zeros
). In this case we pass an arguement with the number of elements that are desired.
zeros_array = np.zeros(12)
display(zeros_array)
ones_array = np.ones(12)
display(ones_array)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
A Range#
It is also possible to create more complex arrays without having to manually input the values via a list. For example, it is possible to create a range of values with the use of np.arange()
. Of note is that, by default, the array will start from zero.
range_array = np.arange(12)
display(range_array)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Exercise 1 - Creating an array#
Usins the .arange function, create this NumPy array: [5,8,11,14,17,20]. It may be helpful to refer to the NumPy .arange() documentation, available here.
#custom_range_array = *your code here*
Exercise 1 Solution
custom_range_array = np.arange(5,23,3)
Other Methods#
There are also a range of other methods that can be used to create NumPy arrays that will most likely be beneficial depending on the particular task you are working on. All of the details for each function are contained in the documentation, but arrays can be based on random numbers or identity matrix as two of the many that are available.
File Output#
If we wanted to save the data that we have created so far, then we could make use of np.save
, which will save an array to a binary file with the .npy extension.
range_array = np.arange(12)
np.save('data/example_range_array.npy', range_array)
Exercise 2 - Saving to a text file#
There are also a range of other file formats that can be used for storing NumPy arrays. Engage with the documentation for NumPy and save the NumPy array to another file format, potentially a .txt file.
Exercise 2 Solution
np.savetxt('data/example_range_array.txt', range_array)
File Input#
It is also possible to use np.load
to read in an array from another .npy file.
downloaded_numpy_array = np.load('data/multidimensional_array.npy')
display(downloaded_numpy_array)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Array Attributes#
It is a common occurence that you will be sent data that you want to analyse, such as in the case above of loading in just the data file itself. In this situation one of the first things you may wish to do is to get an idea of the data by accessing the arrays attributes.
The number of dimensions of an array can be found with the use of .ndim
.
display("The number of dimensions in the array is: " + str(downloaded_numpy_array.ndim))
'The number of dimensions in the array is: 2'
The overall shape of the array can be found with the use of .shape
.
display("The overall shape of the array is: " + str(downloaded_numpy_array.shape))
'The overall shape of the array is: (3, 3)'
The number of values of the array can be found with the use of .size
, giving an overview of the raw number of values within the array.
display("The overall size of the array is: " + str(downloaded_numpy_array.size))
'The overall size of the array is: 9'
As arrays can have a range of different data types it is key to ensure that the data is in the form that you desire it to be, which can be found with the use of .dtype
. One of the reasons you may wish to know the data type itself is to understand the number of bytes each element in the array takes, found via .itemsize
. As the dataset grows larger and larger the amount of memory used will become of key importance with the whole data structure memory footprint being found via .nbytes
.
display("The datatype within the array is: " + str(downloaded_numpy_array.dtype))
display("The memory footprint for each of the elements of the array is: " + str(downloaded_numpy_array.itemsize))
display("The overall memory footprint of the array is: " + str(downloaded_numpy_array.nbytes))
'The datatype within the array is: int64'
'The memory footprint for each of the elements of the array is: 8'
'The overall memory footprint of the array is: 72'
Data Types#
Within NumPy there are a range of different data types that elements within an array can be, including:
Boolean (
bool_
): Used for storingTrue
andFalse
values.Integer (
int_
,int8
,int16
,int32
,int64
): Signed integer types of various sizes.Unsigned Integer (
uint8
,uint16
,uint32
,uint64
): Unsigned integer types of various sizes.Floating Point (
float16
,float32
,float64
): Floating point numbers of various precisions.Complex Numbers (
complex64
,complex128
): Complex numbers with floating point real and imaginary parts.Fixed length strings (
sring_
): Used for text data.Object (
object_
): Used for arrays of python objects.
Examples for each of these different types can be seen below.
# Boolean array
bool_arr = np.array([True, False, True], dtype=np.bool_)
# Integer array
int_arr = np.array([1, 2, 3], dtype=np.int32)
# Floating point array
float_arr = np.array([1.0, 2.0, 3.0], dtype=np.float64)
# Complex number array
complex_arr = np.array([1+2j, 3+4j], dtype=np.complex128)
# String array
string_arr = np.array(["apple", "banana", "cherry"], dtype=np.string_)
# Object array
object_arr = np.array([1, "two", 3.0], dtype=np.object_)
# Print the arrays and their data types
print("Boolean Array:", bool_arr, "Data Type:", bool_arr.dtype)
print("Integer Array:", int_arr, "Data Type:", int_arr.dtype)
print("Floating Point Array:", float_arr, "Data Type:", float_arr.dtype)
print("Complex Number Array:", complex_arr, "Data Type:", complex_arr.dtype)
print("String Array:", string_arr, "Data Type:", string_arr.dtype)
print("Object Array:", object_arr, "Data Type:", object_arr.dtype)
Boolean Array: [ True False True] Data Type: bool
Integer Array: [1 2 3] Data Type: int32
Floating Point Array: [1. 2. 3.] Data Type: float64
Complex Number Array: [1.+2.j 3.+4.j] Data Type: complex128
String Array: [b'apple' b'banana' b'cherry'] Data Type: |S6
Object Array: [1 'two' 3.0] Data Type: object
Array Indexing#
Array indexing is the process of accessing individual or groups of elements from an array. Within NumPy there are a range of different methods of indexing elements, as shown below.
display("The complete array is: ")
display(downloaded_numpy_array)
# Single element indexing
display("Single element indexing: " + str(downloaded_numpy_array[2]))
# Multi-dimensional indexing
display("Multi-dimensional element indexing: " + str(downloaded_numpy_array[2][1]))
# Boolean indexing
display("Boolean conditions: ")
display(downloaded_numpy_array > 4)
display("Boolean indexing: " + str(downloaded_numpy_array[downloaded_numpy_array > 4]))
# Complex Indexing
display("Complex indexing: " )
display(downloaded_numpy_array[[1, 2]])
'The complete array is: '
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
'Single element indexing: [7 8 9]'
'Multi-dimensional element indexing: 8'
'Boolean conditions: '
array([[False, False, False],
[False, True, True],
[ True, True, True]])
'Boolean indexing: [5 6 7 8 9]'
'Complex indexing: '
array([[4, 5, 6],
[7, 8, 9]])
Array Slicing#
Array slicing works in a similar way to slicing for python lists, with some additional capabilities. Some examples of slicing can be seen below.
display("Simple slicing: ")
display(downloaded_numpy_array[1:3])
display("Multi-dimensional slicing: ")
display(downloaded_numpy_array[:, 1:])
display("Step slicing: ")
display(downloaded_numpy_array[::2])
display("Negative slicing:")
display(downloaded_numpy_array[-1:])
'Simple slicing: '
array([[4, 5, 6],
[7, 8, 9]])
'Multi-dimensional slicing: '
array([[2, 3],
[5, 6],
[8, 9]])
'Step slicing: '
array([[1, 2, 3],
[7, 8, 9]])
'Negative slicing:'
array([[7, 8, 9]])
Array Operations#
Arithmetic Operations#
NumPy has the ability to perform a range of element wise arithmetic operations on arrays, such as addition, subtraction, multiplications and division as seen below.
array_1 = np.array([1,2,3,4,5])
array_2 = np.array([3,3,3,3,3])
display("Array 1: " + str(array_1))
display("Array 2: " + str(array_2))
# Addition
display("Adding Array 1 to Array 2: " + str(array_1 + array_2))
# Subtraction
display("Subtracting Array 2 from Array 1: " + str(array_1 - array_2))
# Multiplication
display("Multiplying Array 1 and 2 together: " + str(array_1 * array_2))
# Division
display("Dividing Array 1 by Array 2: " + str(array_1 / array_2))
'Array 1: [1 2 3 4 5]'
'Array 2: [3 3 3 3 3]'
'Adding Array 1 to Array 2: [4 5 6 7 8]'
'Subtracting Array 2 from Array 1: [-2 -1 0 1 2]'
'Multiplying Array 1 and 2 together: [ 3 6 9 12 15]'
'Dividing Array 1 by Array 2: [0.33333333 0.66666667 1. 1.33333333 1.66666667]'
Matrix Operations#
Further to operations on simple arrays, NumPy offers support for operations on more complex data structure such as matrices.
Dot Product#
display("Dot product of array 1 and array 2:")
display(np.dot(array_1, array_2))
'Dot product of array 1 and array 2:'
45
The dot product can then also be computed for two-dimensional arrays.
# Dot for two-dimensional arrays
A = np.array([[1, 2], [6, 4]])
B = np.array([[2, 4], [1, 3]])
print(np.dot(A, B))
[[ 4 10]
[16 36]]
Statistical Operations#
NumPy provides a range of different functions that can perform statistics on a given array, with some key ones seen below.
input_array = np.array([21,60,2,5,10,47,2,10])
display("Mean: " + str(np.mean(input_array)))
display("Median: " + str(np.median(input_array)))
display("Standard Deviation: " + str(np.std(input_array)))
'Mean: 19.625'
'Median: 10.0'
'Standard Deviation: 20.6212117733173'
Logical Operations#
NumPy also capacity to perform logical operations on arrays that contains boolean elements.
x = np.array([False, True, False, False, False])
y = np.array([True, False, True, False, True])
display("AND logical operations:")
display(np.logical_and(x,y))
display("OR logical operations:")
display(np.logical_or(x,y))
'AND logical operations:'
array([False, False, False, False, False])
'OR logical operations:'
array([ True, True, True, False, True])
Shape Manipulations#
It is also possible to alter the shape of a given array without modify the data that is within the array. Two keys operations to manipulate the shape of an array are shown below with .reshape
and .flatten
.
reshape_array = np.array([[1,5,2], [7, 9, 10]])
display(reshape_array)
display("Reshape the array from 3x2 to 2x3:")
display(reshape_array.reshape(3,2))
display("Convert from 2D to 1D:")
display(reshape_array.flatten())
array([[ 1, 5, 2],
[ 7, 9, 10]])
'Reshape the array from 3x2 to 2x3:'
array([[ 1, 5],
[ 2, 7],
[ 9, 10]])
'Convert from 2D to 1D:'
array([ 1, 5, 2, 7, 9, 10])
Set Operations#
NumPy also offers support for set operations, such as getting both the union and the intersection of elements for different arrays as shown below.
array_1 = np.array([1,2,3,4,5])
array_2 = np.array([3,5,3,2,3])
display("Union of array 1 and array 2: ")
display(np.union1d(array_1, array_2))
display("Intersection of array 1 and array 2: ")
display(np.intersect1d(array_1, array_2))
'Union of array 1 and array 2: '
array([1, 2, 3, 4, 5])
'Intersection of array 1 and array 2: '
array([2, 3, 5])
Broadcasting#
Broadcasting in NumPy allows operations to be performed on arrays of different shapes, sizes and dimensions. Different examples of broadcasting are shown below:
# Adding a scale to an array
a = np.array([0, 1,2,3,4,5,6,7,8,9])
b = 10
display("Scalar addition to an array: " )
display(a + b)
# Adding a one dimensional array to a two dimensional array
c = np.array([[10,11,12,13,14,15,16,17,18,19], [20,21,22,23,24,25,26,27,28,29]])
display("Adding a one dimensional array to a two dimensional array: " )
display(a + c)
'Scalar addition to an array: '
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
'Adding a one dimensional array to a two dimensional array: '
array([[10, 12, 14, 16, 18, 20, 22, 24, 26, 28],
[20, 22, 24, 26, 28, 30, 32, 34, 36, 38]])
Exercise 3 - Performing Analysis#
Given the dataset loaded into the input_data
below, answer the following five questions, some of which may require engaging with the NumPy documentation.
What is theaverage values of the elements in the array?
How many unique numbers are there in the array?
What is the standard deviation of the array values?
input_data = np.load("data/exercise_3_array.npy")
Exercise 3 Solution
The average value for the dataset is 490.99, with 622 unique numbers and a standard deviation of 287.4. The Python code using NumPy to compute the results is shown below:
# 1. Calculate the average value of the elements
average_value = np.mean(input_data)
# 2. Find all unique numbers and their count
unique_numbers, counts = np.unique(input_data, return_counts=True)
# 5. Compute the standard deviation of the array values
standard_deviation = np.std(input_data)
# Printing the results
print("Average value:", average_value)
print("Unique numbers count:", len(unique_numbers))
print("Standard deviation:", standard_deviation)