layout: post title: "ML4T-笔记---01-03-The-power-of-NumPy" date: "2019-01-10 10:01:01" categories: 计算机科学 excerpt: "NumPy:
NumPy:
If you're familiar with NumPy (esp. the following operations), feel free to skim through this lesson.
- Create a NumPy array:
- from a pandas dataframe: pandas.DataFrame.values
- from a Python sequence: numpy.array
- with constant initial values: numpy.ones, numpy.zeros
- with random values: numpy.random
- Access array attributes: shape, ndim, size, dtype
- Compute statistics: sum, min, max, mean
- Carry out arithmetic operations: add, subtract, multiply, divide
- Measure execution time: time.time, profile
- Manipulate array elements: Using simple indices and slices, integer arrays, boolean arrays
Pandas is a kind of wrapper for NumPy.
how to access cells within ndarray.
nd1(row,col)
:
nd1[0:3,1:3]
[0:3,1:3] indicates starting at the zeroth row to just before the third and the first column to just before the third. The last value is one past the one that you actually want to include.
Suppose we have these two ND arrays, nd1 and nd2. And we want to replace some of the values in nd1, with these values from nd2.
Which are correct?
one dimensional array from known values.
Use the array function to convert most array-like objects into an ndarray.
Now create a 2D array by passing in a sequence of sequences to the np.array function.
np.empty()
function takes the shape of the array as input.Next, we create an array full of ones.
using np.ones()
, the above example creates an array of 5 rows and 4 columns with all the values equal to 1.
What parameter do you need to add to this function to create an array of integers instead?
Documentation for the array.ones()
function might be helpful.
Documentation: numpy.ones
NumPy User Guide: Data types
Documentation:
- numpy.empty
- numpy.ones
- numpy.zeros
- numpy.array
- numpy.ndarray (direct
ndarray
constructor)
Answer: dtype is the parameter the values to be integers using NumPy data type np.int_.
np.random.random(), np.random.rand(), np.random.normal(),
*the np.random.random()
function generates uniformly sampled floating point values in [0.0, 1.0).
np.random.rand(5,4) is give the same results as np.random.random((5,4))
. Numpy provides this to achieve compatibility with the Matlab syntax.np.random.normal()
function samples from normal distribution. the function also accepts mean and std of the distribution as input.
To generate integers, use the np.random.randint()
function
np.random.randint()
WILL generate a single integer between the range 0 and 10.np.random.randint()
to generate 5 integers between 0 and 10.NumPy Reference: Random sampling
Sampling functions:
- numpy.random.random: Samples a Uniform distribution in [0.0, 1.0)
- numpy.random.rand: Like
random
, but slightly different syntax- numpy.random.normal: Normal or Gaussian distribution
- numpy.random.randint: Integers from Uniform distribution
Attributes like size and shape are very useful when you have to over array elements to perform some computation.
given ndarray a, a = np.random.random((5,4))
:
a.shape
will return the shape of it as a tuple ( (5,4))
a.shape[0]
will return number of rows (5)
a.shape[1]
will return number of columns (4)
len(a.shape)
and a.ndim
will return the dimension of the array, e.g. a has 2 dimensions
a.size
returns the total number of elements in an array.
a.dtype
checks the data type of the values present in array A.
Attributes of numpy.ndarray:
- numpy.ndarray.shape: Dimensions (height, width, ...)
- numpy.ndarray.ndim: No. of dimensions
= len(shape)
- numpy.ndarray.size: Total number of elements
- numpy.ndarray.dtype: Datatype
Time: 00:02:33
mathematical operations on np arrays
seed
to generate random numbersimport numpy as np
np.random.seed(693)
a = np.random.randint(0,10, size = (5,4))
The output is an array with five rows, four columns, and all the values between the range 0 and 10. And since seed
function is used, the random number generator with the constant, to get the same sequence of numbers every time.
a.sum()
sums all the elements in an array
a.sum(axis = 0)
returns the sum of each columns.
a.sum(axis = 1)
returns the sum of each rows.
a.min(axis = 0)
minimum of each column
a.max(axis = 1)
the maximum of each row
a.mean()
the mean of the entire array.
NumPy Reference: Mathematical functions
- numpy.sum: Sum of elements - along rows, columns or all
- numpy.min, numpy.max, numpy.mean: Simple statistics
Also: numpy.random.seed to (re)set the random number generator.
Time: 00:03:40
The answer could also be return a.argmax()
NumPy Reference: Sorting, searching, and counting
the time library can help us know how fast our operation is.
capture the time snapshot before and after the operation is performed and subtract the two times.
Time: 00:00:56
Documentation:
- time.time: Time in seconds, as a floating-point number
This module demos how fast NumPy can perform certain operations. will skip this one. All you need to know is that NumPy is fast
Documentation:
- time.time: Current time in seconds (float value)
- timeit: Average execution time measurement
- profile: Code profiling
iPython "magics":
a[3,2]
a[0:2, 0:2]
a[:,0:3:2
n is to m is to t, will give you values in the range n before m, but in steps of size t, hence this statement will give you values of the column 0.Skip the values of the column one, and then give the values of the column 2.
NumPy Reference: Indexing
Note: Indexing starts at
0
(zero).
Time: 00:02:29
a[0, 0] = 1
This will give us access to the element at the position 0, 0 in the a and Using the assignment operator =
to assign a value one to it.
a[0, :] = 2
can assign value of 2 to the entire row.
a[:, 3] = [1, 2, 3, 4, 5]
can assign a list of values to a row or a column.
Time: 00:01:32
the length of the indices array and the returned array will be the same. Also it return value from array a at index 1,1,2,3.
NumPy Reference: Indexing
- Integer array indexing: Select array elements with another array
Time: 00:01:33
a[a < mean]
for each value in array A, compare it with the mean, If it is less, we retain the value.
a[a < mean] = mean
replace these values with the mean value.
NumPy Reference: Indexing
Time: 00:01:47
2 * a
it is element-wise multiplication.
a / 2.0
if the array and the divisor are integers, the output will also be integers. Using 2.0 instead of 2 as the divisor, we will get float values.
a + b
a * b
a / b
As seen before, since array a and b are integers, we get the final array in the form of integers as well. convert one of the arrays to float to get results as float .
Resources from NumPy User Guide and Reference:
Time: 00:00:16
Total Time: 00:35:59
First Draft 2019-01-10