Skip to main content
HomeCheat sheetsPython

Pandas Cheat Sheet: Data Wrangling in Python

This cheat sheet is a quick reference for data wrangling with Pandas, complete with code samples.
Jun 2021  · 4 min read

By now, you'll already know the Pandas library is one of the most preferred tools for data manipulation and analysis, and you'll have explored the fast, flexible, and expressive Pandas data structures, maybe with the help of DataCamp's Pandas Basics cheat sheet.

Yet, there is still much functionality that is built into this package to explore, especially when you get hands-on with the data: you'll need to reshape or rearrange your data, iterate over DataFrames, visualize your data, and much more. And this might be even more difficult than "just" mastering the basics. 

That's why today's post introduces a new, more advanced Pandas cheat sheet. 

It's a quick guide through the functionalities that Pandas can offer you when you get into more advanced data wrangling with Python. 

(Do you want to learn more? Start our Data Manipulation with pandas course for free now or try out our Pandas DataFrame tutorial! )

Pandas Cheat Sheet

Have this cheat sheet at your fingertips

Download PDF

The Pandas cheat sheet will guide you through some more advanced indexing techniques, DataFrame iteration, handling missing values or duplicate data, grouping and combining data, data functionality, and data visualization. 

In short, everything that you need to complete your data manipulation with Python!

Don't miss out on our other cheat sheets for data science that cover MatplotlibSciPyNumpy, and the Python basics.

Reshape Data 

Pivot 

>>> df3= df2.pivot(index='Date', #Spread rows into columns          columns='Type',          values='Value')

Stack/ Unstack 

>>>stacked= df5.stack() #Pivot a level of column	labels>>> stacked.unstack() #Pivot a level of index labels

Melt 

>>> pd.melt(df2, #Gather columns into rows           id_vars=[''Date''],           value_vars=[''Type'', ''Value''],           value name=''Observations'')

Iteration 

>>> df.iteritems() #{Column-index, Series) pairs>>> df.iterrows() #{Row-index, Series) pairs

Missing Data 

>>> df.dropna() #Drop NaN values>>> df3.fillna(df3.mean()) #Fill NaN values with a predetermined value>>> df2.replace("a", "f") #Replace values with others

Advanced Indexing   

Selecting

>>> df3.loc[:,(df3>1).any()] #Select cols with any vols >1>>> df3.loc[:,(df3>1).all()] #Select cols with vols> 1>>> df3.loc[:,df3.isnull().any()] #Select cols with NaN>>> df3.loc[:,df3.notnull().all()] #Select cols without NaN

Indexing With isin ()

>>> df[(df.Country.isin(df2.Type))] #Find some elements>>> df3.filter(iterns="a","b"]) #Filter on values>>> df.select(lambda x: not x%5) #Select specific elements

Where

>>> s.where(s > 0) #Subset the data

 Query

>>> df6.query('second > first') #Query DataFrame

Setting/Resetting Index 

>>> df.set_index('Country') #Set the index>>> df4 = df.reset_index() #Reset the index>>> df = df.rename(index=str, #Rename          DataFrame columns={"Country":"cntry",          "Capital":"cptl", "Population":"ppltn"})

Reindexing 

>>>  s2   = s. reindex (['a','c','d','e',' b'])

Forward Filling

>>> df.reindex(range(4),          method='ffill')
Country  Capital  Population 
0 Belgium  Brussels 11190846
1 India  New Dehli  1303171035
2 Brazil Brasilia 207847528
3 Brazil Brasilia 207847528

Backward Filling 

>>> s3 = s.reindex(range(5),          method='bfill')
0 3
1 3
2 3
3 3
4 3

Multi-Indexing 

>>>arrays= [np.array([1,2,3]),          np.array([5,4,3])]>>> df5 = pd.DataFrame(np.random.rand(3, 2), index=arrays)>>>tuples= list(zip(*arrays))>>>index= pd.Multilndex.from_tuples(tuples,               names= ['first','second'])>>> df6 = pd.DataFrame(np.random.rand(3, 2), index=index)>>> df2.set_index(["Date", "Type"])

Duplicate Data 

>>> s3.unique() #Return unique values>>> df2.duplicated('Type') #Check duplicates>>> df2.drop_duplicates('Type', keep='last') #Drop duplicates>>> df.index.duplicated() #Check index duplicates

Grouping Data 

Aggregation

>>> df2.groupby(by=['Date','Type']).mean()>>> df4.groupby(level=0).sum()>>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len (x), 'b': np.sum})

Transformation

>>> customSum = lambda x: (x+x%2)>>> df4.groupby(level=0).transform(customSum)

Combining Data 

Merge 

>>> pd.merge(data1,          data2,           how=' left',           on='X1')

>>> pd.merge(data1,          data2,           how='right',           on='X1')

>>> pd.merge(data1,          data2,           how='inner',           on='X1')

>>> pd.merge(data1,          data2,           how='outer',           on='X1')

Join 

>>> data1.join(data2, how='right')

Concatenate 

Vertical

>>> s.append(s2)

Horizontal/Vertical

>>> pd.concat([s,s2],axis=1, keys=['One','Two'])>>> pd.concat([datal, data2], axis=1, join='inner')

Dates 

>>> df2['Date']= pd.to_datetime(df2['Date'])>>> df2['Date']= pd.date_range('2000-1-1',          periods=6,           freq='M')>>>dates= [datetime(2012,5,1), datetime(2012,5,2)]>>>index= pd.Datetimelndex(dates)>>>index= pd.date_range(datetime(2012,2,1), end, freq='BM')

Visualization 

>>> import matplotlib.pyplot as plt>>> s.plot()>>> plt.show()

>>> df2.plot()>>> plt.show()

Topics
Related

tutorial

Everything You Need to Know About Python Environment Variables

Learn the ins and outs of managing Python environment variables with os and python-dotenv libraries.
Bex Tuychiev's photo

Bex Tuychiev

9 min

tutorial

Everything You Need to Know About Python's Maximum Integer Value

Explore Python's maximum integer value, including system limits and the sys.maxsize attribute.
Amberle McKee's photo

Amberle McKee

5 min

tutorial

Python KeyError Exceptions and How to Fix Them

Learn key techniques such as exception handling and error prevention to handle the KeyError exception in Python effectively.
Javier Canales Luna's photo

Javier Canales Luna

6 min

tutorial

Snscrape Tutorial: How to Scrape Social Media with Python

This snscrape tutorial equips you to install, use, and troubleshoot snscrape. You'll learn to scrape Tweets, Facebook posts, Instagram hashtags, or Subreddits.
Amberle McKee's photo

Amberle McKee

8 min

tutorial

Troubleshooting The No module named 'sklearn' Error Message in Python

Learn how to quickly fix the ModuleNotFoundError: No module named 'sklearn' exception with our detailed, easy-to-follow online guide.
Amberle McKee's photo

Amberle McKee

5 min

tutorial

AWS Storage Tutorial: A Hands-on Introduction to S3 and EFS

The complete guide to file storage on AWS with S3 & EFS.
Zoumana Keita 's photo

Zoumana Keita

16 min

See MoreSee More