Data is an oil in today’s Scenario
What this tutorial will cover
What is Pandas
Capabilities of Pandas
How to install pandas
How to load different file formats using pandas
Removing NaN Values using Pandas
Data Manipulation using pandas
What is pandas
Pandas Python is high level abstraction over low level NumPy. Eloquent syntax as well as huge functionality makes it as my choice for data analysis ans data manipulation. One extra point which justifies its enormous usage is Pandas Python is open source. It is NUMFOCUS Project. There are numerous NUMFOCUS open source projects being used by NASA and many others.
Capabilities of Pandas
1.) Reading files of different format such as json, csv, and many others
2.) Manipulating data and performing calculations on rows and columns of data.
3.) Reshapping the data and removing the abnormalities from the data such as not defined values.
4.) Visulaizing the data using the matplotlib.
5.) It is a data cleaning tool and is an intermediate between data storage and model building stage.
How to install Pandas
Pandas can be installed using pip. pip is package manager for python. Please execute the below code to install pandas.
pip install pandas
You can also install using pip3 if you have python installed. Please execute the below code to install it.
pip3 install pandas
How to load different file formats using pandas
As I said earlier, pandas is used to load files of different format such as json, csv, excel files. Before importing these files we need to import pandas which is done by the following code
import pandas
Reading json file using pandas
We will see reading different files reading and here comes reading json file. This is dummy code. Please have the json file with you and must have the path where the json file is placed. If everything goes well please execute the below code
with open("name_of_file.json") as jsonfile: data = json.load(jsonfile)
Some important points to be understood with respect to above code
You must place the file with the working directory or must copy the full path of the file where it is placed if not in the current directory.
Reading excel file using pandas
Hope you have done with the json format and please move with me reading excel files using pandas. Please see the below code for simplicity.
import pandas as pd data_frame = pd.read_excel(file_name, sheet_name=None)
Some important points to be understood with respect to above code
There are many other parameters which can be passed in the read_excel which are demonstrated below
sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, **kwds
I will suggest you to go to pandas documentation for reading excel file and learn their meanings.
Reading CSV file using pandas
I think you have successfully read josn file and excel file. Now it is time to read comma seperated values file also known as csv file. This is implementing by the below code.
import pandas as pd data_csv = pd.read_csv('name_of_csv_fie')
Some important points to be understood with respect to above code
You must place the file with the working directory or must copy the full path of the file where it is placed if not in the current directory.
Removing NaN/Replacing NaN Values using Pandas
As I have earlier said in the capabilities part, Pandas is an intermediate between the data storage and model training. When we load data, it is chance that it may contain the abnormalities in the data such as NaN values. It is very easy to remove these values using pandas. I am writing the code to replace the NaN values with zero because it will not delete the entire row or column where NaN Values occur because we need other values.
import pandas as pd data_csv = pd.read_csv('name_of_csv_fie') data_csv.fillna(0)
Some import note
Sometimes this will not work data_csv.fillna(0) as it is not put in some variable so I modified code if above is not working is as below
import pandas as pd data_csv = pd.read_csv('name_of_csv_fie') data_csv = data_csv.fillna(0) #put the value in some variable print(data_csv)
Data Manipulation using pandas
I will create a dataframe from dictionary and will sum all the values of the single columns of ages using pandas.
Please see the below code to understand my point of view
import pandas as pd data = {'age': [42, 52, 36, 24, 73] } df = pd.DataFrame(data, columns = ['age']) print(df) Total = df['age'].sum() print(Total)
Output of this code
age
0 42
1 52
2 36
3 24
4 73
227
What you have learned after reading this blog
I think you come to know why studying pandas is important with respect to datascience and data analysis. You have also come to know what are the capabilities of the pandas and how such is implemented using codes. I am also sure that you have understood outline of this blog. Hope you will provide some feedback so that I could write things in more effective way. Thanks for sparing some time in reading this blog.
Some additional information
AI Sangam is a Data science solution and consulting company in India with vision of providing intelligent solutions to solve everyday as well as complex problems. We deal with building artificial and intelligent chatbots, real time face recognition, resume filtering based on natural language processing, python teaching, django, tornado and flask live projects. We also know that IT is changing dynamic, keeping this in view point we also teach some latest technology such as docker, kubernetes, containers, google cloud engine, amazon instance and integrating application on these platform.
If doubt where to contact
If you feel some improvement in technical knowledge, you may also contact us at aisangamofficial@gmail.com or can chat with us or talk with us at skype with id: live:aisangamofficial. Please donot forget to visit aisangam you tube channel. You may also follow us at
Facebook
Twitter
Linkedln
Pinterest
Tumbler
Reddit
Hello AI Sangam!!!
It is great to have such a worthful blog written. I have been quite impressed with the way you have represented the article right from starting to the end. I have a lot of problem in understanding data frame especially how to create dataframe from the dictionary.
Codes you have provided are understandable and effective. I wish you best of luck.
Dear creativeamerica
Thanks for writing such mail to us. Any feedback is valuable to us. I tried to make the outline before writing the topic so that readers can get to know what actually they are going to read and get. Secondly writing code makes you understand the concept in more precised way.
With Regards
AI Sangam
Hello, I would say Great article. I except such article in the future from the aisangam
We would like to thanks you for such valuable comment.
With Regards
AI Sangam
Hello AI Sangam!!
This is one of the finest article which I have read. Great understanding especially for Removing NaN/Replacing NaN Values using Pandas as i faced some difficulty in it.
Other part is that it well documented and is agile in nature.
One suggestion from me: Please write more technical topics on tensorflow as i am big fan of it. Please continue tensorflow series because it is very helpful to guys like us who are a start in AI and machine learning.
With Regards
Ideas Innovation
Thanks for your valuable and precious words. AI Sangam is a Data Science company which tries to provide best quality content and products. I also suggest you to go to the official website to know us more. Our official website is http://www.aisangam.com
With Regards and Sincere thanks
AI Sangam
Hello budy
When some one read such articles which contains both the quality content as well as code it becomes easy for people to understand.
There is issue following in the code Reading json file using pandas: You have load the json data but has not read it using the pandas. Please provide the code to read the data using dataframe
With regards
Programmer
Hello Programmer!!
Hope you are doing great. I forget to add it in the next line.
Thanks for reminding me. I hope you will get more stuff from us.
With Regards
http://www.aisangam.com
Hello AI Sangam!!!
It is great to understand concept from such a site. I have many points in the data frame where i faced problem which was resolved while i went through the blog.
I want to also add a new point which is exporting pandas to dictionary by combining multiple row values. Please see the below code to know it
from pandas import DataFrame
df = DataFrame([[‘A’, 123, 1], [‘A’, 318, 9], [‘C’, 178, 6], [‘A’, 321, 3]], columns=[‘name’, ‘value1’, ‘value2’])
print(df)
d = {}
for i in df[‘name’].unique():
d[i] = [{df[‘value1’][j]: df[‘value2’][j]} for j in df[df[‘name’]==i].index]
print(d)
Thanks for such an email.
I have understood your point. Highly appreciable.
Great article. Most part that I learned is simplicity of the article. Also code is also simple and understandable.
With Regards
foundationideas
Thanks and AI Sangam is highly appreciable for your efforts for writing the feedback. Feedback helps us to grow.
Well written as well as well documented blog.
With Regards
Success Stories
Thanks for such feedback.
Your words are precious for us.
Hello AI Sangam. Hope you are fine and good.
Data Frame and pandas are integral part of machine learning and deep learning. They work with numpy arrays and anonymous function to analyse and visualize the data. It is a tool for data manipulation.
I read the article written by you. Great and keep it up
With regards
glbaat
Thanks for such compleiment and nice explaantion of Dataframe
With Regards
AI Sangam
I would like to mention some of the pros and cons of the article
——————————————————————————————
pros
——
It is good and effective article.
cons
——
You can write some more manipulation using DataFrame.
Thanks for such words.
I would try to add some more material when I would have time.
With Regards
AI Sangam
Hello AI Sangam!!
I face the problem of getting the index number when I save the .csv file using pandas. Please help me. I have executed the following command
df.to_csv(‘myfile.csv)
Thanks selfawarenesshub for reaching out to us. Your problem is welcomed. As I can see in the question, you have not set the index number = False. Please see the official documentation of df.to_csv [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html]. Please see the below code to correct your code
Hello AI Sangam!!
It is great to write to you. Hope you are fine. I would like to appreciate you for writing such content. Content is highly meaningful and it has cleared much of my concepts. I have a doubt in mind which is used to create the dictionary from the Dataframe. Along with that, i need to dump the file in the Json form. Please help me out.
First of all thanks Creative Ideas.
Please follow the below code
Thanks a lot AI Sangam for such a reply. This code really helped me out to clear my questions.
Thanks a lot again.
I have read this article and especially comment section. I would the work very useful. I am new to the pandas and wants to know how to convert the dataframe into numpy array.
I would like to say that comment section of aisangam is very knowledgeable and i learnt a lot from this section
With regards and love
ritesh
Thank you ritesh for approaching to us.I think you need the numpy array to fed to the machine so you are asking this question. Anyway the solution to your question as as below
This code will convert the dataframe into array. Hope this has helped you. If you have any more question please do reply here. I also suggest, please visit the below link to know us more
http://www.aisangam.com/blog/category/tutorials/long-short-term-memory/
http://www.aisangam.com/blog/category/tutorials/tensorflow-tutorials/
http://www.aisangam.com/blog/category/webframework/
Hello AI Sangam!!
Hope you are doing well and enjoying good health. I read your article thoroughly and found the material so effective. Sir I want to know how to convert list into the dataframe as I was struct at this post.
Please provide me the demo for that. I would be thankful to you.
Thanks foundationideas for such comment. Please find the below code to help you out
With Regards
http://www.aisangam.com
Hello ai sangam. I am, facing an error while reading the csv file using pandas. I am referring the error as below
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte in position : invalid continuation byte
I am not understanding why it came and how to resolve it.
Hello Urvashi, how are you and it is great feeling that guys like you are sharing your errors with us here. Please see the explanation of such as below
A Unicode Decode Error is typically caused by not specifying the encoding of the file, and happens when you have a file with non-standard characters. For a quick fix, try opening the file in Sublime Text (Text Editor), and re-saving with encoding ‘UTF-8’.
With regards
AI Sangam