Basics of Numpy — Python for Data Analysis

Vaibhav Sharma
7 min readSep 30, 2018

--

NumPy

In the internet world, all the things are data and all things are incomplete without data that’s why in current time the demand for data-scientist and machine learning developer is too high and salary package of these developers are also on the peak level. But if you are a student or any other developer and want to learn these technologies, the first barrier is these terminologies Artificial Intelligence, Machine Learning, and Data Science. 90% of newcomers waste his/her time to differentiate these technologies. After that, on the internet, there is no perfect documentation for learning these technologies. I already talked to many friends, search on the internet and go through the job posts related to Data Scientists and machine learning developers after that I concluded that Numpy is the starting point to start the machine learning.

Introduction of Numpy :

Numpy is the library in python programming language for scientific computation. It supports multi-dimensional array and matrices with the help of high-level mathematical functions which are written in this library. It is a scientific computation package written in python and contains many things like :

  1. Powerful N-dimensional array object.
  2. Broadcasting functions.
  3. Tools for integrating C/C++ and Fortran code.
  4. Useful for computing linear algebra, Fourier transform, and random number capabilities.

N-Dimensional Array ( ndarray ):

ndarray is the container of the n-dimensional array with the same shape and same dtype. The dimensions and items are defined by its shape and type of items defined by dtype. Like other python’s container objects, ndarray access and modified the items by indexing/slicing.

ndarray

Data types in Numpy :

Data types in Numpy

Import Numpy package :

Before start coding in the numpy, we need to install the package of Numpy by python package manager (PIP) by the below command:

pip install numpy

After successfully installed the numpy package, you can import the numpy package as np.

import numpy as np
Import numpy

Creating array using Numpy :

np.array( ) :

create basic array

np.array() is using for creating the array in numpy. The complete syntax of np.array is below :

array(object, dtype = None, copy = True, order = 'K', subok = False, ndmin = 0)

In these arguments, the only object is needed rest of all are optional.

If you want to read detailed article of numpy array functions, please refer Mastering in Numpy — Part 1(Coming soon)

dtype
dtype - data type

dtype an argument is optional and tells what is the desired data-type of the array. If not given, then type will be determined by the minimum type required to hold the objects in the sequence.

without dtype

In above example, we can create 2-d array with data-type float because np.array() determined that one value is float type (1.5) so the type of np.array() is float.

np.arange( ) :

np.arange( )

np.arange( ) is the best way to create large matrices with n-dimensional. By default, np.arange( ) created one dimensional array, if you want to create 2-Dimensional or 3-Dimensional matrices, you can use np.reshape ( ) with np.arange .

np.arange(start=None, stop=None, step=None, dtype=None)

np.arange ( ) has four parameters:

  1. start — starting the array from the start number.
  2. stop — end the array (excluded in stop value)
  3. step — jump the value
  4. dtype — the type of array or matrices
np.arange(0, 10, 3)
>> array([0, 3, 6, 9])
===============================
start = 0
stop = 10
step = 3

If you are not giving step value by default it will take value 1.

the default value of step is 1

start, step and dtype is optional parameter in np.arange ( ) function. If start is not present as an argument, the default value of start is 0.

the default value of start is 0

If dtype is not given, then take data type from another argument of functions.

Take dtype from ‘step’ parameter

In above example, I’m not given the value of dtype but np.arange( ) take the value of dtype as a float by examining the type of all other parameter value, then the type of step value is a float that’s why dtype of resulting array is also float.

np.linspace( ) :

np.linspace ( )

np.linspace ( ) is the best way to create any size of the evenly spaced array or matrices between specified interval. By default, np.linspace( ) created one dimensional array, if you want to create 2-Dimensional or 3-Dimensional matrices, you can use np.reshape ( ) with np.linspace .

np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

np.linspace ( ) has 6 parameters :

  1. start — Starting value of the sequence
  2. stop — Last value of the sequence
  3. num — Number of samples to be generated
  4. endpoint — This is a boolean value. If the value of endpoint is true, then stop is the last sample of the sequence. The default value of endpoint is true.
  5. retstep — Default value is true. If the value of retstep is true then return samples and steps(the difference between 2 samples)
  6. dtype — The type of array or matrices
np.linspace(5., 10., 3)
array([5., 7.5, 10])
============================
start = 5.
stop = 10.
num = 3

If you are not giving num value by default it will take the value as 50.

np.linspace( ) without ‘num’ value

The next parameter of np.linspace( ) is endpoint. This parameter takes a boolean value. If this value has true, then the stop value is the last sample of the array. By default, the endpoint has true value. In the below example, I’m giving “False” value to the endpoint that’s why resulting array not included stop value as last sample value.

np.linspace( ) with endpoint=False

The next parameter of np.linspace( ) is retstep. This parameter takes a boolean value. If this value has true, the result shows the difference between the two samples. By default, retstep has a false value. In the below example, I’m giving “True” value to retstep.

np.linspace( ) with retstep=True

If dtype is not given, then take data type from another argument of functions.

np.reshape( ) :

np.reshape( )

This is the main function in numpy library because it converts the 1-Dimensional array to n-Dimensional matrices. In this above example, np.reshape( ) function converts 1-dimensional array to 2 -Dimensional matrices. The first point to remember about using np.reshape( ) function is that the multiplication of parameters equals to the no. of elements in the 1-Dimensional array.

np.arange(0, 10, 3)
array([0, 3, 6, 9])
===================
Total numbers of array elements = 4
===========================================
np.arange(0, 10, 3).reshape(2, 2)
array([[0, 3],[6, 9]])
======================
Multiplication of reshape parameters = 2*2 = 4
===================================================Total numbers of array elements === Multiplication of reshape parameters

np.zeros( ) :

np.zeros()

np.zeros( ) gives the 0 value to each sample in the matrix.

np.ones( ) :

np.ones( )

np.ones( ) gives the 1 value to each sample in the matrix.

np.full( ) :

np.full( )

If you want to give constant values in the array using Numpy, then you will use np.full ( ). It gives the constant samples. In the above example, the constant value is 7 that’s why each sample of resultant array is 7.

np.eye( ) :

np.eye( )

np.eye( ) gives the identical matrix as a result. In the above example, it gives 2-Dimensional identical matrices.

np.random( ) :

np.random( )

np.random( ) gives the random samples. The above example gives 2-Dimensional matrix of 2*3(number of rows = 2 and number of columns = 3) with random values

Conclusion :

This is the very first tutorial of Numpy library. So that’s why I covered basics of NumPy. So, in the second instalment of this series we will cover Arithmetic operations of matrices using Numpy. Please refer to my second tutorial of this series Arithmetic operations in Numpy — Python for Data Analysis

--

--