NumPy and Pandas
Introduction of NumPy and Pandas
NumPy and Pandas are two popular Python libraries used for data manipulation, analysis, and visualization.
- To install NumPy. Open a command prompt or terminal window and Run the following command:
pip3 install numpy
- To install Pandas. Open a command prompt or terminal window and Run the following command:
pip3 install pandas
Noted To install Numpy and Pandas in Google Colab. Run the following commands:
!pip3 install numpy
!pip3 install pandas
NumPy
NumPy is a library for numerical computing in Python.
It provides a powerful array object that can hold and manipulate large amounts of data efficiently.
NumPy also provides many mathematical functions for array operations, including linear algebra, Fourier analysis, and random number generation.
Here's an example of how to use NumPy to perform some basic operations:
import numpy as np
# Create a 2D NumPy array
a = np.array([[1, 2], [3, 4]])
# Print the array and its shape
print(a)
print(a.shape)
# Output:
# [[1 2]
# [3 4]]
# (2, 2)
# Compute the mean of each row and column
print(np.mean(a, axis=0))
print(np.mean(a, axis=1))
# Output:
# [2. 3.]
# [1.5 3.5]
Pandas
Pandas is a library for data manipulation and analysis in Python.
It provides two primary data structures: Series (1-dimensional) and DataFrame (2-dimensional).
Pandas allows you to read data from various sources, including CSV, Excel, SQL databases, and web APIs.
Pandas also provides many functions for data manipulation, including filtering, grouping, sorting, merging, and reshaping.
Here's an example of how to use Pandas to read a CSV file and perform some basic operations:
import pandas as pd
# Read a CSV file into a Pandas DataFrame
df = pd.read_csv("data.csv")
# Print the first five rows of the DataFrame
print(df.head())
# Output:
# id name age gender
# 0 1 John 25 male
# 1 2 Jane 30 female
# 2 3 Alexander 22 male
# 3 4 Anne 28 female
# 4 5 Brian 35 male
# Compute some basic statistics on the age column
print(df["age"].describe())
# Output:
# count 5.000000
# mean 28.000000
# std 5.830952
# min 22.000000
# 25% 25.000000
# 50% 28.000000
# 75% 30.000000
# max 35.000000
In this example, we read a CSV file into a Pandas DataFrame and printed the first five rows using the head method. We also computed some basic statistics on the age column using the describe method.
Arrays and Matrices in NumPy
Arrays and matrices are fundamental data structures in NumPy, a powerful numerical computing library in Python. Here's a brief overview:
Arrays
- Arrays in
NumPy
are homogeneous, multi-dimensional collections of data of the same type. - They can have any number of dimensions, such as 1D, 2D, 3D, etc.
- Arrays are commonly used for representing vectors, matrices, and multi-dimensional data.
NumPy
provides a wide range of functions for creating, manipulating, and performing mathematical operations on arrays.
import numpy as np
# Creating a 1D array
a = np.array([1, 2, 3, 4, 5])
# Creating a 2D array
b = np.array([[1, 2, 3], [4, 5, 6]])
# Accessing array elements
print(a[0]) # Output: 1
print(b[1, 2]) # Output: 6
# Performing operations on arrays
c = np.array([1, 2, 3])
d = np.array([4, 5, 6])
print(c + d) # Output: [5, 7, 9]
print(c * d) # Output: [4, 10, 18]
Matrices
- Matrices in
NumPy
are a specific type of 2D array with a defined number of rows and columns. - They are used for representing mathematical matrices and performing matrix operations, such as matrix multiplication, determinant, inverse, etc.
NumPy
provides functions for creating, manipulating, and performing various matrix operations.
import numpy as np
# Creating a matrix
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Accessing matrix elements
print(a[0, 1]) # Output: 2
# Performing matrix operations
b = np.array([[9, 8, 7], [6, 5, 4], [3, 2, 1]])
print(np.dot(a, b)) # Output: [[30, 24, 18], [84, 69, 54], [138, 114, 90]]
Arrays and matrices in NumPy provide powerful capabilities for working with numerical data in Python, making it a popular choice for data science, machine learning, and scientific computing applications. We will see operations of the matrix and arrays in more detials Mathematics Fundamental.
Operations in Matrices and Arrays
Array Creation :
NumPy
provides various functions to create arrays, such asnp.array()
,np.zeros()
,np.ones()
,np.arange()
,np.linspace()
, etc. You can specify the shape, data type, and other properties while creating arrays.import numpy as np
# Creating an array with np.array()
a = np.array([1, 2, 3, 4, 5])
print(a) # Output: [1 2 3 4 5]
# Creating an array with np.zeros()
b = np.zeros((3, 4)) # 3x4 array filled with zeros
print(b)
"""
Output:
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
"""
# Creating an array with np.ones()
c = np.ones((2, 3), dtype=int) # 2x3 array filled with ones of integer data type
print(c)
"""
Output:
[[1 1 1]
[1 1 1]]
"""
# Creating an array with np.arange()
d = np.arange(1, 10, 2) # Array with values 1 to 9 with step of 2
print(d) # Output: [1 3 5 7 9]
# Creating an array with np.linspace()
e = np.linspace(0, 1, 5) # Array with 5 equally spaced values between 0 and 1
print(e) # Output: [0. 0.25 0.5 0.75 1. ]Array Indexing and Slicing : You can access individual elements or slices of arrays using indexing and slicing in NumPy. Indexing starts from 0, and negative indices can be used to access elements from the end of the array.
import numpy as np
# Array indexing
a = np.array([1, 2, 3, 4, 5])
print(a[0]) # Output: 1
print(a[-1]) # Output: 5
# Array slicing
b = np.array([1, 2, 3, 4, 5])
print(b[1:4]) # Output: [2 3 4]
print(b[:3]) # Output: [1 2 3]
print(b[2:]) # Output: [3 4 5]
print(b[::2]) # Output: [1 3 5]Array Reshaping : You can change the shape of arrays using the
reshape()
function inNumPy
. The reshaped array has the same data but a different shape.import numpy as np
# Array reshaping
a = np.array([1, 2, 3, 4, 5, 6])
b = a.reshape((2, 3)) # Reshape to a 2x3 array
print(b)
"""
Output:
[[1 2 3]
[4 5 6]]
"""
# Flattening an array
c = b.flatten() # Flatten the 2D array to a 1D array
print(c) # Output: [1 2 3 4 5 6]Array Operations :
NumPy
provides a wide range of mathematical and element-wise operations for arrays, such as as arithmetic operations (+
,-
,*
,/
), element-wise functions (sin
,cos
,exp
,sqrt
, etc.), linear algebra operations (dot product
,matrix multiplication
, etc.), statistical operations (mean
,sum
,min
,max
, etc.), and many more.import numpy as np
# Array arithmetic operations
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
c = a + b # Element-wise addition
d = a * b # Element-wise multiplication
e = a / b # Element-wise division
print(c) # Output: [6 6 6 6 6]
print(d) # Output: [5 8 9 8 5]
print(e) # Output: [0.2 0.5 1. 2. 5. ]
# Element-wise functions
f = np.sin(a) # Element-wise sine function
g = np.exp(b) # Element-wise exponential function
h = np.sqrt(a) # Element-wise square root function
print(f) # Output: [0.84147098 0.90929743 0.14112001 -0.7568025 -0.95892427]
print(g) # Output: [148.4131591 54.59815003 20.08553692 7.3890561 2.71828183]
print(h) # Output: [1. 1.41421356 1.73205081 2. 2.23606798]
# Linear algebra operations
i = np.dot(a, b) # Dot product of two arrays
j = np.matmul(a, b) # Matrix multiplication of two arrays
print(i) # Output: 35
print(j) # Output: 35
# Statistical operations
k = np.mean(a) # Mean of array a
l = np.sum(b) # Sum of array b
m = np.min(a) # Minimum value of array a
n = np.max(b) # Maximum value of array b
print(k) # Output: 3.0
print(l) # Output: 15
print(m) # Output: 1
print(n) # Output: 5Matrix operations are an important aspect of numerical computing and data analysis.
NumPy
, a popular Python library for numerical computing, provides various functions and methods for performing matrix operations efficiently. Here are some examples of matrix operations inNumPy
:import numpy as np
# Matrix creation
A = np.array([[1, 2], [3, 4]]) # 2x2 matrix
B = np.array([[5, 6], [7, 8]]) # 2x2 matrix
# Matrix addition
C = A + B
print(C) # Output: [[ 6 8]
# [10 12]]
# Matrix subtraction
D = A - B
print(D) # Output: [[-4 -4]
# [-4 -4]]
# Matrix multiplication
E = np.dot(A, B)
F = A.dot(B) # Equivalent to np.dot(A, B)
print(E) # Output: [[19 22]
# [43 50]]
print(F) # Output: [[19 22]
# [43 50]]
# Element-wise matrix multiplication
G = A * B
print(G) # Output: [[ 5 12]
# [21 32]]
# Matrix transpose
H = A.T
print(H) # Output: [[1 3]
# [2 4]]
# Matrix inverse
I = np.linalg.inv(A)
print(I) # Output: [[-2. 1. ]
# [ 1.5 -0.5]]
# Matrix determinant
det_A = np.linalg.det(A)
print(det_A) # Output: -2.0
# Matrix rank
rank_A = np.linalg.matrix_rank(A)
print(rank_A) # Output: 2
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(eigenvalues) # Output: [-0.37228021+0.j 5.37228021+0.j]
print(eigenvectors)
# Output: [[-0.82456484 -0.41597356]
# [ 0.56576746 -0.90937671]]
DataFrames and Series in Pandas
DataFrames and Series are two main data structures provided by the Pandas library in Python for data manipulation and analysis. Here's a brief overview :
Series
A Series is a one-dimensional labeled array that can hold various data types such as
integers
,floats
,strings
, etc. It is similar to aNumPy
array, but with labels or indices associated with each element. Series are created using thepd.Series()
function inPandas
. Example:import pandas as pd
# Create a Series
s = pd.Series([1, 2, 3, 4, 5])
print(s)
# Output:
# 0 1
# 1 2
# 2 3
# 3 4
# 4 5
# dtype: int64
DataFrame
A DataFrame is a two-dimensional tabular data structure with labeled axes (rows and columns). It can be thought of as a collection of Series objects, where each Series represents a column of data. DataFrames are useful for storing and manipulating data that can be thought of as spreadsheet-like or SQL table-like.
import pandas as pd
# Create a dictionary of data
data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 32, 18, 47],
'gender': ['F', 'M', 'M', 'M']}
# Create a DataFrame from the dictionary
df = pd.DataFrame(data)
# Display the DataFrame
print(df)
# Output:
# name age gender
# 0 Alice 25 F
# 1 Bob 32 M
# 2 Charlie 18 M
# 3 David 47 M
# Accessing columns in a DataFrame
print(df['name']) # Access the 'name' column
print(df.name) # Another way to access the 'name' column
# Output:
# 0 Alice
# 1 Bob
# 2 Charlie
# 3 David
# Name: name, dtype: object
# Accessing rows in a DataFrame
print(df.iloc[0]) # Access the first row by index (integer location)
print(df.loc[0]) # Access the first row by label (index label)
# Output:
# name Alice
# age 25
# gender F
# Name: 0, dtype: object
# Adding a new column to a DataFrame
df['city'] = ['New York', 'Los Angeles', 'Chicago', 'Houston']
print(df)
# Output:
# name age gender city
# 0 Alice 25 F New York
# 1 Bob 32 M Los Angeles
# 2 Charlie 18 M Chicago
# 3 David 47 M Houston
# Filtering rows in a DataFrame based on a condition
filtered_df = df[df['age'] > 30]
print(filtered_df)
# Output:
# name age gender city
# 1 Bob 32 M Los Angeles
# 3 David 47 M Houston
# Grouping data in a DataFrame by a column
grouped_df = df.groupby('gender').mean()
print(grouped_df)
# Output:
# age
# gender
# F 25.0
# M 32.333333