You are currently viewing Quick Introduction to understand Pandas and DataFrame
pandas: powerful Python data analysis toolkit

Quick Introduction to understand Pandas and DataFrame

Loading

                                            Data is an oil in today’s Scenario

What this tutorial will cover

What is Pandas
Capabilities of Pandas
How to install pandas
How to load different file formats using pandas
Removing NaN Values using Pandas
Data Manipulation using pandas

What is pandas

Pandas Python is high level abstraction over low level NumPy. Eloquent syntax as well as huge functionality makes it as my choice for data analysis ans data manipulation. One extra point which justifies its enormous usage is Pandas Python is open source. It is NUMFOCUS Project. There are numerous NUMFOCUS open source projects being used by NASA and many others.

Capabilities of Pandas

1.) Reading files of different format such as json, csv, and many others
2.) Manipulating data and performing calculations on rows and columns of data.
3.) Reshapping the data and removing the abnormalities from the data such as not defined values.
4.) Visulaizing the data using the matplotlib.
5.) It is a data cleaning tool and is an intermediate between data storage and model building stage.

How to install Pandas

Pandas can be installed using pip. pip is package manager for python. Please execute the below code to install pandas.

pip install pandas

You can also install using pip3 if you have python installed. Please execute the below code to install it.

pip3 install pandas

How to load different file formats using pandas

As I said earlier, pandas is used to load files of different format such as json, csv, excel files. Before importing these files we need to import pandas which is done by the following code

import pandas

Reading json file using pandas

We will see reading different files reading and here comes reading json file. This is dummy code. Please have the json file with you and must have the path where the json file is placed. If everything goes well please execute the below code

with open("name_of_file.json") as jsonfile:
    data = json.load(jsonfile)

Some important points to be understood with respect to above code

You must place the file with the working directory or must copy the full path of the file where it is placed if not in the current directory.

Reading excel file using pandas

Hope you have done with the json format and please move with me reading excel files using pandas. Please see the below code for simplicity.

import pandas as pd
data_frame = pd.read_excel(file_name, sheet_name=None)

Some important points to be understood with respect to above code

There are many other parameters which can be passed in the read_excel which are demonstrated below

sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, **kwds

I will suggest you to go to pandas documentation for reading excel file and learn their meanings.

Reading CSV file using pandas

I think you have successfully read josn file and excel file. Now it is time to read comma seperated values file also known as csv file. This is implementing by the below code.

import pandas as pd
data_csv = pd.read_csv('name_of_csv_fie')

Some important points to be understood with respect to above code

You must place the file with the working directory or must copy the full path of the file where it is placed if not in the current directory.

Removing NaN/Replacing NaN Values using Pandas

As I have earlier said in the capabilities part, Pandas is an intermediate between the data storage and model training. When we load data, it is chance that it may contain the abnormalities in the data such as NaN values. It is very easy to remove these values using pandas. I am writing the code to replace the NaN values with zero because it will not delete the entire row or column where NaN Values occur because we need other values.

import pandas as pd
data_csv = pd.read_csv('name_of_csv_fie')
data_csv.fillna(0)

Some import note

Sometimes this will not work data_csv.fillna(0) as it is not put in some variable so I modified code if above is not working is as below

import pandas as pd
data_csv = pd.read_csv('name_of_csv_fie')
data_csv  = data_csv.fillna(0)                #put the value in some variable
print(data_csv)

Data Manipulation using pandas

I will create a dataframe from dictionary and will sum all the values of the single columns of ages using pandas.
Please see the below code to understand my point of view

import pandas as pd
data = {'age': [42, 52, 36, 24, 73]  }
df = pd.DataFrame(data, columns = ['age'])
print(df)
Total = df['age'].sum()
print(Total)

Output of this code

age
0 42
1 52
2 36
3 24
4 73
227

What you have learned after reading this blog

I think you come to know why studying pandas is important with respect to datascience and data analysis. You have also come to know what are the capabilities of the pandas and how such is implemented using codes. I am also sure that you have understood outline of this blog. Hope you will provide some feedback so that I could write things in more effective way. Thanks for sparing some time in reading this blog.

Some additional information

AI Sangam is a Data science solution and consulting company in India with vision of providing intelligent solutions to solve everyday as well as complex problems. We deal with building artificial and intelligent chatbots, real time face recognition, resume filtering based on natural language processing, python teaching, django, tornado and flask live projects. We also know that IT is changing dynamic, keeping this in view point we also teach some latest technology such as docker, kubernetes, containers, google cloud engine, amazon instance and integrating application on these platform.

If doubt where to contact

If you feel some improvement in technical knowledge, you may also contact us at aisangamofficial@gmail.com or can chat with us or talk with us at skype with id: live:aisangamofficial. Please donot forget to visit aisangam you tube channel. You may also follow us at
Facebook
Twitter
Linkedln
Pinterest
Tumbler
Reddit

This Post Has 29 Comments

  1. creativeamerica

    Hello AI Sangam!!!
    It is great to have such a worthful blog written. I have been quite impressed with the way you have represented the article right from starting to the end. I have a lot of problem in understanding data frame especially how to create dataframe from the dictionary.

    Codes you have provided are understandable and effective. I wish you best of luck.

    1. AISangam

      Dear creativeamerica
      Thanks for writing such mail to us. Any feedback is valuable to us. I tried to make the outline before writing the topic so that readers can get to know what actually they are going to read and get. Secondly writing code makes you understand the concept in more precised way.

      With Regards
      AI Sangam

  2. Jasleen

    Hello, I would say Great article. I except such article in the future from the aisangam

    1. AISangam

      We would like to thanks you for such valuable comment.

      With Regards
      AI Sangam

  3. Ideas Innovation

    Hello AI Sangam!!

    This is one of the finest article which I have read. Great understanding especially for Removing NaN/Replacing NaN Values using Pandas as i faced some difficulty in it.

    Other part is that it well documented and is agile in nature.

    One suggestion from me: Please write more technical topics on tensorflow as i am big fan of it. Please continue tensorflow series because it is very helpful to guys like us who are a start in AI and machine learning.

    With Regards
    Ideas Innovation

    1. AISangam

      Thanks for your valuable and precious words. AI Sangam is a Data Science company which tries to provide best quality content and products. I also suggest you to go to the official website to know us more. Our official website is http://www.aisangam.com

      With Regards and Sincere thanks
      AI Sangam

  4. Programmer

    Hello budy

    When some one read such articles which contains both the quality content as well as code it becomes easy for people to understand.

    There is issue following in the code Reading json file using pandas: You have load the json data but has not read it using the pandas. Please provide the code to read the data using dataframe

    With regards
    Programmer

    1. AISangam

      Hello Programmer!!

      Hope you are doing great. I forget to add it in the next line.

      df = pd.DataFrame(data)
      

      Thanks for reminding me. I hope you will get more stuff from us.

      With Regards
      http://www.aisangam.com

  5. Halle

    Hello AI Sangam!!!

    It is great to understand concept from such a site. I have many points in the data frame where i faced problem which was resolved while i went through the blog.

    I want to also add a new point which is exporting pandas to dictionary by combining multiple row values. Please see the below code to know it

    from pandas import DataFrame

    df = DataFrame([[‘A’, 123, 1], [‘A’, 318, 9], [‘C’, 178, 6], [‘A’, 321, 3]], columns=[‘name’, ‘value1’, ‘value2’])
    print(df)
    d = {}
    for i in df[‘name’].unique():
    d[i] = [{df[‘value1’][j]: df[‘value2’][j]} for j in df[df[‘name’]==i].index]
    print(d)

    1. AISangam

      Thanks for such an email.

      I have understood your point. Highly appreciable.

  6. foundationideas

    Great article. Most part that I learned is simplicity of the article. Also code is also simple and understandable.

    With Regards
    foundationideas

    1. AISangam

      Thanks and AI Sangam is highly appreciable for your efforts for writing the feedback. Feedback helps us to grow.

  7. Success Stories

    Well written as well as well documented blog.

    With Regards
    Success Stories

    1. AISangam

      Thanks for such feedback.

      Your words are precious for us.

  8. glbaat

    Hello AI Sangam. Hope you are fine and good.

    Data Frame and pandas are integral part of machine learning and deep learning. They work with numpy arrays and anonymous function to analyse and visualize the data. It is a tool for data manipulation.

    I read the article written by you. Great and keep it up

    With regards
    glbaat

    1. AISangam

      Thanks for such compleiment and nice explaantion of Dataframe

      With Regards
      AI Sangam

  9. Btech Projects

    I would like to mention some of the pros and cons of the article
    ——————————————————————————————
    pros
    ——
    It is good and effective article.
    cons
    ——
    You can write some more manipulation using DataFrame.

    1. AISangam

      Thanks for such words.

      I would try to add some more material when I would have time.

      With Regards
      AI Sangam

  10. selfawarenesshub

    Hello AI Sangam!!

    I face the problem of getting the index number when I save the .csv file using pandas. Please help me. I have executed the following command

    df.to_csv(‘myfile.csv)

    1. AISangam

      Thanks selfawarenesshub for reaching out to us. Your problem is welcomed. As I can see in the question, you have not set the index number = False. Please see the official documentation of df.to_csv [https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html]. Please see the below code to correct your code

      df.to_csv('myfile.csv',index=False)
      
  11. Creative Ideas

    Hello AI Sangam!!

    It is great to write to you. Hope you are fine. I would like to appreciate you for writing such content. Content is highly meaningful and it has cleared much of my concepts. I have a doubt in mind which is used to create the dictionary from the Dataframe. Along with that, i need to dump the file in the Json form. Please help me out.

    1. AISangam

      First of all thanks Creative Ideas.

      Please follow the below code

      dt = pd.read_csv(file_path, index_col=1, skiprows=1).to_dict()
      import json
      
      with open('result.json', 'w') as fp:
          json.dump(dt, fp)
      
      1. Creative Ideas

        Thanks a lot AI Sangam for such a reply. This code really helped me out to clear my questions.

        Thanks a lot again.

  12. ritesh

    I have read this article and especially comment section. I would the work very useful. I am new to the pandas and wants to know how to convert the dataframe into numpy array.

    I would like to say that comment section of aisangam is very knowledgeable and i learnt a lot from this section

    With regards and love
    ritesh

    1. AISangam

      Thank you ritesh for approaching to us.I think you need the numpy array to fed to the machine so you are asking this question. Anyway the solution to your question as as below

      numpy_array = df.values
      

      This code will convert the dataframe into array. Hope this has helped you. If you have any more question please do reply here. I also suggest, please visit the below link to know us more
      http://www.aisangam.com/blog/category/tutorials/long-short-term-memory/
      http://www.aisangam.com/blog/category/tutorials/tensorflow-tutorials/
      http://www.aisangam.com/blog/category/webframework/

  13. foundationideas

    Hello AI Sangam!!

    Hope you are doing well and enjoying good health. I read your article thoroughly and found the material so effective. Sir I want to know how to convert list into the dataframe as I was struct at this post.

    Please provide me the demo for that. I would be thankful to you.

    1. AISangam

      Thanks foundationideas for such comment. Please find the below code to help you out

      import pandas as pd
      df = pd.DataFrame({'col':name of list})
      

      With Regards
      http://www.aisangam.com

  14. Urvashi

    Hello ai sangam. I am, facing an error while reading the csv file using pandas. I am referring the error as below

    UnicodeDecodeError: ‘utf-8’ codec can’t decode byte in position : invalid continuation byte

    I am not understanding why it came and how to resolve it.

    1. AISangam

      Hello Urvashi, how are you and it is great feeling that guys like you are sharing your errors with us here. Please see the explanation of such as below

      A Unicode Decode Error is typically caused by not specifying the encoding of the file, and happens when you have a file with non-standard characters. For a quick fix, try opening the file in Sublime Text (Text Editor), and re-saving with encoding ‘UTF-8’.

      With regards
      AI Sangam

Leave a Reply