Python is increasingly being used as a scientific language. Matrix and vector manipulations are extremely important for scientific computations. Both NumPy and Pandas have emerged to be essential libraries for any scientific computation, including machine learning, in python due to their intuitive syntax and high-performance matrix computation capabilities.
In the above code, we created a pandas DataFrame object, a tabular data structure that resembles a spreadsheet like those used in Excel.For those familiar with SQL, you can view a DataFrame as an SQL table.The DataFrame we created consists of four columns, each with entries of different data types (integer, float, string, and Boolean). NumPy and pandas.
![]()
In this post, we will provide an overview of the common functionalities of NumPy and Pandas. We will realize the similarity of these libraries with existing toolboxes in R and MATLAB. This similarity and added flexibility have resulted in wide acceptance of python in the scientific community lately. Topic covered in the blog are:
This post is an excerpt from a live hands-on training conducted by CloudxLab on 25th Nov 2017. It was attended by more than 100 learners around the globe. The participants were from countries namely; United States, Canada, Australia, Indonesia, India, Thailand, Philippines, Malaysia, Macao, Japan, Hong Kong, Singapore, United Kingdom, Saudi Arabia, Nepal, & New Zealand.
What is NumPy?
NumPy stands for ‘Numerical Python’ or ‘Numeric Python’. It is an open source module of Python which provides fast mathematical computation on arrays and matrices. Since, arrays and matrices are an essential part of the Machine Learning ecosystem, NumPy along with Machine Learning modules like Scikit-learn, Pandas, Matplotlib, TensorFlow, etc. complete the Python Machine Learning Ecosystem.
NumPy provides the essential multi-dimensional array-oriented computing functionalities designed for high-level mathematical functions and scientific computation. Numpy can be imported into the notebook using
NumPy’s main object is the homogeneous multidimensional array. It is a table with same type elements, i.e, integers or string or characters (homogeneous), usually integers. In NumPy, dimensions are called axes. The number of axes is called the rank.
There are several ways to create an array in NumPy like np.array, np.zeros, no.ones, etc. Each of them provides some flexibility. Samsung switch mac os x 10.7 downloadwnload free.
Some of the important attributes of a NumPy object are:
NumPy array elements can be accessed using indexing. Below are some of the useful examples:
The session covers these and some important attributes of the NumPy array object in detail.
Vectors and Machine learning
Machine learning uses vectors. Vectors are one-dimensional arrays. It can be represented either as a row or as a column array.
What are vectors? Vector quantity is the one which is defined by a magnitude and a direction. For example, force is a vector quantity. It is defined by the magnitude of force as well as a direction. It can be represented as an array [a,b] of 2 numbers = [2,180] where ‘a’ may represent the magnitude of 2 Newton and 180 (‘b’) represents the angle in degrees.
Another example, say a rocket is going up at a slight angle: it has a vertical speed of 5,000 m/s, and also a slight speed towards the East at 10 m/s, and a slight speed towards the North at 50 m/s. The rocket’s velocity may be represented by the following vector: [10, 50, 5000] which represents the speed in each of x, y, and z-direction.
Similarly, vectors have several usages in Machine Learning, most notably to represent observations and predictions.
For example, say we built a Machine Learning system to classify videos into 3 categories (good, spam, clickbait) based on what we know about them. For each video, we would have a vector representing what we know about it, such as: [10.5, 5.2, 3.25, 7.0]. This vector could represent a video that lasts 10.5 minutes, but only 5.2% viewers watch for more than a minute, it gets 3.25 views per day on average, and it was flagged 7 times as spam.
As you can see, each axis may have a different meaning. Based on this vector, our Machine Learning system may predict that there is an 80% probability that it is a spam video, 18% that it is clickbait, and 2% that it is a good video. This could be represented as the following vector: class_probabilities = [0.8,0.18,0.02].
As can be observed, vectors can be used in Machine Learning to define observations and predictions. The properties representing the video, i.e., duration, percentage of viewers watching for more than a minute are called features.
Since the majority of the time of building machine learning models would be spent in data processing, it is important to be familiar to the libraries that can help in processing such data.
Why NumPy and Pandas over regular Python arrays?
In python, a vector can be represented in many ways, the simplest being a regular python list of numbers. Since Machine Learning requires lots of scientific calculations, it is much better to use NumPy’s ndarray, which provides a lot of convenient and optimized implementations of essential mathematical operations on vectors.
Vectorized operations perform faster than matrix manipulation operations performed using loops in python. For example, to carry out a 100 * 100 matrix multiplication, vector operations using NumPy are two orders of magnitude faster than performing it using loops.
Some ways in which NumPy arrays are different from normal Python arrays are:
So, it is easier to assign values to a slice of an array in a NumPy array as compared to a normal array wherein it may have to be done using loops.
How To Download Numpy And Pandas On Mac 10
If we need a copy of the NumPy array, we need to use the copy method as another_slice = another_slice = a[2:6].copy(). If we modify another_slice, a remains same
Array[row_start_index:row_end_index, column_start_index: column_end_index]
NumPy arrays can also be accessed using boolean indexing. For example,
NumPy arrays are capable of performing all basic operations such as addition, subtraction, element-wise product, matrix dot product, element-wise division, element-wise modulo, element-wise exponents and conditional operations.
An important feature with NumPy arrays is broadcasting.
In general, when NumPy expects arrays of the same shape but finds that this is not the case, it applies the so-called broadcasting rules.
Basically, there are 2 rules of Broadcasting to remember:
NumPy provides basic mathematical and statistical functions like mean, min, max, sum, prod, std, var, summation across different axes, transposing of a matrix, etc.
A particular NumPy feature of interest is solving a system of linear equations. NumPy has a function to solve linear equations. For example,
Can be solved in NumPy using
What is Pandas?
Kodak esp 3200 series aio driver download mac. Similar to NumPy, Pandas is one of the most widely used python libraries in data science. It provides high-performance, easy to use structures and data analysis tools. Unlike NumPy library which provides objects for multi-dimensional arrays, Pandas provides in-memory 2d table object called Dataframe. It is like a spreadsheet with column names and row labels.
Hence, with 2d tables, pandas is capable of providing many additional functionalities like creating pivot tables, computing columns based on other columns and plotting graphs. Pandas can be imported into Python using:
Some commonly used data structures in pandas are:
Pandas Series object is created using pd.Series function. Each row is provided with an index and by defaults is assigned numerical values starting from 0. Like NumPy, Pandas also provide the basic mathematical functionalities like addition, subtraction and conditional operations and broadcasting.
Pandas dataframe object represents a spreadsheet with cell values, column names, and row index labels. Dataframe can be visualized as dictionaries of Series. Dataframe rows and columns are simple and intuitive to access. Pandas also provide SQL-like functionality to filter, sort rows based on conditions. For example,
New columns and rows can be easily added to the dataframe. In addition to the basic functionalities, pandas dataframe can be sorted by a particular column.
How To Get Numpy
Dataframes can also be easily exported and imported from CSV, Excel, JSON, HTML and SQL database. Some other essential methods that are present in dataframes are:
Install Numpy
What is matplotlib?
Matplotlib is a 2d plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments. Matplotlib can be used in Python scripts, Python and IPython shell, Jupyter Notebook, web application servers and GUI toolkits.
matplotlib.pyplot is a collection of functions that make matplotlib work like MATLAB. Majority of plotting commands in pyplot have MATLAB analogs with similar arguments. Let us take a couple of examples:
How To Download Numpy And Pandas On Mac Os
Summary
Hence, we observe that NumPy and Pandas make matrix manipulation easy. This flexibility makes them very useful in Machine Learning model development.
How To Download Numpy And Pandas On Mac Download
Check out the free course on Python for Machine Learning by CloudxLab. You can find the in-depth video tutorials on NumPy, Pandas, and Matplotlib in the course.
Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |