آموزش آنالیز داده با Pandas و Python
دز این دوره آنالیز سریع و آسان داده با استفاده از کتابخانه ی قدرتمند پایتون به نام Pandas را فرا می گیرید.
سرفصل های دوره:
- نصب و راه اندازی
- معرفی دوره
- معرفی فایل های کامل دوره
- سیستم عامل مک - دانلود توزیع Anaconda
- سیستم عامل مک - نصب توزیع Anaconda
- سیستم عامل مک - دسترسی به ترمینال
- سیستم عامل مک - به روز رسانی کتابخانه های Anaconda
- سیستم عامل مک - مواد دوره آموزشی Unpack + فرآیند Startdown و Shutdown
- ویندوز - دانلود توزیع Anaconda
- ویندوز - نصب توزیع Anaconda
- ویندوز - دسترسی به دستورالعمل و به روز رسانی کتابخانه های Anaconda
- ویندوز - مواد دوره آموزشی Unpack + فرآیند Startdown و Shutdown
- معرفی رابط نوت بوک Jupyter
- انواع سلول ها و مدل های سلولی
- اجرای سلول کد
- Shortcut های صفحه کلید محبوب
- وارد کردن کتابخانه ها به Jupyter Notebook
- پایتون Crash Course، قسمت 1 - انواع داده ها و متغیرها
- پایتون Crash Course، قسمت 2 - لیست ها
- پایتون Crash Course، قسمت 3 - واژه نامه ها
- پایتون Crash Course، قسمت 4 - اپراتورها
- پایتون Crash Course، قسمت 5 - توابع
- سری ها
- ایجاد نوت بوک Jupyter برای ماژول سری
- ایجاد سری شیء از لیست پایتون
- ایجاد سری شیء از دیکشنری پایتون
- معرفی ویژگی ها
- معرفی متد ها
- پارامترها و آرگومان ها
- وارد کردن سری ها با متد ()read_csv.
- متد های ()head. و ()tail.
- توابع ساخته شده در پایتون
- ویژگی های سری
- متد ()sort_values.
- پارامتر inplace
- متد ()sort_index.
- پایتون در کلید واژه
- استخراج مقادیر سری از طریق شاخص موقعیت
- استخراج مقادیر سری از طریق شاخص برچسب
- متد ()get. در سری ها
- روش های ریاضی در اشیاء سری
- متد های ()idxmax. و ()idxmin.
- متد ()value_counts.
- متد ()apply.
- متد ()map.
- بازبینی ماژول سری
- DataFrames I
- معرفی ماژول DataFrames I
- متد ها و ویژگی های مشترک بین سری و DataFrames
- تفاوت متدهای مشترک
- انتخاب یک ستون از یک DataFrame
- انتخاب دو یا چند ستون از DataFrame
- افزودن ستون جدید به DataFrame
- عملیات Broadcasting
- بازبینی متد ()value_counts.
- ایجاد ردیف ها با مقادیر تهی
- پر کردن مقادیرتهی با روش ()fillna.
- متد ()astype.
- مرتب سازی DataFrame با متد ()sort_values. - قسمت اول
- مرتب سازی DataFrame با متد ()sort_values. - قسمت دوم
- مرتب سازی DataFrame با مت ()sort_index.
- درجه بندی مقادیر با متد ()rank.
- DataFrames II
- فیلتر یک DataFrame را بر اساس یک شرط
- DataFrames III
- معرفی ماژول DataFrames III + وارد کردن مجموعه داده
- تنظیمات جدید برای یک سلول یا ردیف خاص
- تنظیم مقدار چندگانه در DataFrame
- تغییر نام برچسب ها یا ستون های فهرست در DataFrame
- حذف سطرها یا ستون ها از DataFrame
- کار با داده های متن
- معرفی کار با ماژول متن داده
- متد های رشته در شاخص و ستون
- معرفی ماژول MultiIndex
- معرفی ماژول GroupBy
- ادغام، پیوستن و اتصال
- کار با تاریخ و زمان
- معرفی کار با ماژول تاریخ و زمان
- بازبینی ماژول زمانبندی پایتون
- شیء زمانبندی pandas
- آبجکت pandas DateTimeIndex
Analyze data quickly and easily with Python's powerful pandas library! All datasets included --- beginners welcome!
Installation and Setup
Introduces Python, pandas, Anaconda, Jupyter Notebook, and the course prerequisitesExplores sample Jupyter Notebooks to showcase the power of pandas for data analysisThe pandas.zip attachment with the working files for this course is attached to this lesson. Download and unpack the pandas.zip file in the directory of your choice.
Introduction to the Course
Completed Course Files
The next batch of lessons focuses on the installation and configuration process for pandas on a Mac machine. In this lesson, we download the Anaconda distribution from the Continuum Analytics.If you're new to Python, choose the 3.5 version of the distribution.
Mac OS - Download the Anaconda Distribution
In this lesson, we install the Anaconda distribution on a Mac OS machine from the executable package we downloaded. The process installs Python and over 100 of the most popular libraries for data science in a central directory on your computer.
Mac OS - Install Anaconda Distribution
The Terminal is an application for communicating with your Mac with text-based commands. In this lesson, you'll learn two ways to access the Terminal on a Mac OS machine.
Mac OS - Access the Terminal
We need to install and update some Python libraries to ensure a smooth process with Jupyter Notebooks and pandas. In this lesson, we use the Terminal to complete the update process.
Mac OS - Update Anaconda Libraries
This course is bundled with a collection of .csv and .xlsx files for you to use. I strongly recommend following along with my tutorials by practicing the syntax on your end.In this lesson, I'll explain the startup and shutdown process for a Jupyter Notebook session. Follow this process every time you come back to the course.
Mac OS - Unpack Course Materials + The Startdown and Shutdown Process
The Windows operating system comes in 32-bit and 64-bit versions. In this lesson, we'll access the Control Panel to determine what category your computer falls into and then download the proper version of the Anaconda distribution on the Continuum Analytics website.
Windows - Download the Anaconda Distribution
Run the Anaconda installer package on a Windows computer. The executable installs Python, pandas, Jupyter Notebook and over 100 popular libraries for data analysis.
Windows - Install Anaconda Distribution
Access the Command Prompt on a Windows machine. The prompt (also known as the command line) is used to interact with the computer with text-based commands. We'll use it to download additional Python libraries for the course and update all installed Anaconda libraries.
Windows - Access the Command Prompt and Update Anaconda Libraries
This course is bundled with .csv and .xlsx files. The primary .zip file is attached to the first lesson of this course. In this lesson, we'll unpack the course materials and learn the startup and shutdown process for a Jupyter Notebook. Follow this process as you proceed throughout the course.
Windows - Unpack Course Materials + The Startdown and Shutdown Process
Explore the Jupyter Notebook interface including the toolbars and buttons. We'll also dive into the Kernel > Restart options, which reset the connection between the server and the Notebook.
Intro to the Jupyter Notebook Interface
Learn about the two different modes (Edit Mode and Command Mode) within a Jupyter Notebook. Edit Mode modifies the contents of a cell and Command Mode enables keyboard shortcuts to work on the entire Notebook as a whole.
Cell Types and Cell Modes
Learn the multiple keyboard shortcuts to execute code cells and Markdown cells. We'll also learn how Jupyter Notebook chooses what to output below a cell that has multiple commands.
Code Cell Execution
Memorize some popular keyboard shortcuts for adding and deleting cells in a Jupyter Notebook.
Popular Keyboard Shortcuts
Use the import keyword to import Python libraries into a Jupyter Notebook. This lesson covers most of the libraries we will utilize throughout the course including pandas, numpy, and matplotlib.
Import Libraries into Jupyter Notebook
This next batch of lessons offers a quick crash course on the Python programming language. In this lesson, we'll review Python comments, the built-in type function, and variables.
Python Crash Course, Part 1 - Data Types and Variables
In this lesson ,we'll review Python lists and how to extract values from them by index position. A list is the equivalent of an array in other programming languages. It is used to store an ordered collection of objects.
Python Crash Course, Part 2 - Lists
Review the Python dictionary object which associates keys with values. The keys must be unique; the values can be duplicated. Dictionaries are created with curly braces and pairs of comma-separated key value pairs.
Python Crash Course, Part 3 - Dictionaries
Review Python's mathematical and equality operators. These will be critical for pandas filtering processes later in the course.
Python Crash Course, Part 4 - Operators
Define and call a sample Python function. A function is a reusable chunk of code that can accept inputs (arguments) and return outputs. We'll use custom functions later on our pandas object to apply operations to all values in a dataset.
Python Crash Course, Part 5 - Functions
Create a Jupyter Notebook for the Series module. The Series is a one-dimensional pandas object that combines the best features of a Python list and a Python dictionary.
Create Jupyter Notebook for the Series Module
A pandas Series can be created with the pd.Series() constructor method. In this lesson, we'll practice creating a few sample Series by feeding in Python lists as inputs to the constructor method.
Create A Series Object from a Python List
The pd.Series() constructor method accepts a variety of inputs. In this lesson, we'll create a Series from a Python dictionary. We'll also explore the differences between the pandas Series and Python's built-in objects, and understand how the index operates in a Series.
Create A Series Object from a Python Dictionary
Objects in pandas have attributes and methods. Methods actively interact with and modify the object while attributes return information about the object's state. In this lesson, we'll use the .values, .index, and .dtype attributes on a Series object.
Intro to Attributes
In this lesson, we'll continue our exploration of methods on pandas object. We'll utilize the .sum(), .product(), and the .mean() mathematical methods on a sample Series.
Intro to Methods
Parameters are the options that a method has. Arguments are the choices we choose for those options. In this lesson, we'll learn the syntax of supplying arguments to parameters on pandas methods.
Parameters and Arguments
The time has come to import our first datasets into our Jupyter Notebook work environment. We'll use the pd.read_csv() method to import 2 CSV files, then modify the squeeze parameter's argument to import the data as a Series object instead of a DataFrame.
Import Series with the .read_csv() Method
Use the .head() and .tail() methods to return a specified number of rows from the beginning or end of a Series. The methods return a brand new Series.
The .head() and .tail() Methods
See how the Series interacts with Python's built-in functions including len, type, sorted, list, dict, max, and min. pandas works seamlessly with all of them.
Python Built-In Functions
Get some new Series attributes on the pandas Series object including .size, .name, and .is_unique. Attributes return information about the object; methods directly modify the object.
More Series Attributes
Call the .sort_values() method on a Series to sort the values in ascending or descending order. We'll see how this command operates on both a numeric and alphabetical dataset.
The .sort_values() Method
Modify the argument to the inplace parameter on a Series method to permanently modify the object it is called on. This is an alternative to reassigning the new object to the same variable.
The inplace Parameter
Call the .sort_index() method on a pandas Series to sort it by the index instead of its values.
The .sort_index() Method
Use Python's in keyword and attributes to check if a value exists in either the values or index of a Series. If the .index or .values attribute is not included, pandas will default to searching among the Series index.
Python's in Keyword
Use bracket notation to extract Series values by their index position.
Extract Series Values by Index Position
Use bracket notation to extract Series values by their index labels
Extract Series Values by Index Label
Call the .get() method on a Series to extract values from a Series. This is alternative syntax to the traditional bracket syntax.
The .get() Method on a Series
Call popular mathematical methods including .count(), .sum(), and .mean() on a Series. There are additional statistical methods available in the official pandas documentation.
Math Methods on Series Objects
Call the .idxmax() and .idxmin() methods to extract the index positions of the highest or lowest values in a Series. We'll see how these can be used to extract the highest / lowest values as well.
The .idxmax() and .idxmin() Methods
Call the .value_counts() method to count the number of the times each unique value occurs in a Series. The result will be a brand new Series where each unique value from the original Series serves as an index label.
The .value_counts() Method
Call the .apply() method and feed it a Python function as an argument to use the function on every Series value. This is helpful for executing custom operations that are not included in pandas or numpy.
The .apply() Method
Call the .map() method to tie together the values from one object to another. We'll practice with (a) two Series and (b) a Series and a dictionary object.
The .map() Method
Review the pandas Series concepts you explored in this module with this action-packed quiz!
A Review of the Series Module
Let's create a Jupyter Notebook for this first DataFrame-focused module. We'll import the pandas library and introduce the nba.csv dataset that we'll be using for the next couple of lessons.
Intro to DataFrames I Module
The pandas Series and DataFrame object share many attributes and methods in common. In this lesson, we'll review popular attributes like .index, .values, .shape, .ndim, and .dtypes and see how they work on a 2-D DataFrame. We'll also introduce new attributes including .columns and .axes that are exclusive to DataFrames.
Shared Methods and Attributes between Series and DataFrames
Series and DataFrame may share attributes and methods but they are still different objects. In this lesson, we'll see how identical methods operate differently depending on the pandas object they are called on.
Differences between Shared Methods
Use two syntactical options to extract a single column from a pandas DataFrame. I prefer the square bracket approach because it works 100% of the time. The alternative option is using dot syntax, which treats the columns as attributes of the larger DataFrame object.
Select One Column from a DataFrame
In this lesson, we'll select two or more columns from a pandas DataFrame. We'll still need bracket syntax to extract but now we'll include a Python list to specify the specific columns we'd like to pull out. The result will be a new DataFrame.
Select Two or More Columns from a DataFrame
In addition to extracting existing columns, bracket syntax can be assed to create a new column on the right end of a DataFrame and populating it values. In this lesson, we'll also dive into the alternate .insert() method to insert a column into the middle of a DataFrame.
Add New Column to DataFrame
A broadcasting operation performs an operation on all values within a pandas object. In this lesson, we'll apply several mathematical operations to values in a DataFrame column (i.e. a Series) including the .add(), .sub(), .mul() and .div() methods. We'll also cover the operator shortcuts for these methods.
Refresh your memory on the .value_counts() Series method, which counts the number of times each unique value occurs within the Series. The result is a brand new Series.
A Review of the .value_counts() Method
Null values are represented with a NaN marker in pandas. In this lesson, we'll delete rows with null (NaN) values by caling the .dropna() method. We'll also modify the arguments of the method to specify how to select the rows to be deleted.
Drop Rows with Null Values
One alternative to dropping null value is populating them with a predefined value. In this lesson, we'll call the .fillna() method to accomplish this. We'll practice the method on both DataFrame and Series objects.
Fill in Null Values with the .fillna() Method
Data types in a Series will not always be the types we want or the types that are best for efficiency. In this lesson, we'll convert the data types in a Series with the .astype() method. We'll also show how to overwrite an old Series with a Series of new data values.
The .astype() Method
Call the .sort_values() method to sort the values in a DataFrame based on the values in a single column. The method is a bit more complex than when called on a single-dimensional pandas Series.
Sort a DataFrame with the .sort_values() Method, Part I
In this lesson, we'll explore additional parameters to the .sort_values() method to sort the values in a DataFrame based on the values in multiple columns. We'll also cover how to specify different sort orders (ascending vs. descending) on different columns.
Sort a DataFrame with the .sort_values() Method, Part II
Call the .sort_index() method to sort the values in a DataFrame based on their index positions or labels instead of their values.
Sort DataFrame with the .sort_index() Method
Values in a Series can be ranked in order with the .rank() method. In this lesson, we'll practice this method on a numeric Series and then confirm the results through our own sort test.
Rank Values with the .rank() Method
Create the Jupyter Notebook for this second DataFrame-focused module. The focus of this module is filtering -- how we extract rows from a DataFrame that fit one or more conditions. We'll be using an employees.csv dataset consisting of workers from a fictional company.
This Module's Dataset + Memory Optimization
In this lesson, we'll filter rows from the DataFrame based on a single condition. The logic involves creating a Boolean Series of True and False values, then passing it in square brackets after our DataFrame.
Filter a DataFrame Based on A Condition
In this lesson, we'll explore more complex row filtering based on multiple conditions. The syntax requires some additional symbols (&) to specify that we want to check the truthiness of multiple conditions.
Filter with More than One Condition (AND - &)
In this lesson, we'll continue filtering rows from the DataFrame based on multiple conditions. However, this time we'll use a new symbol ( | ) to specify an OR check. This requires only one of the tested conditions to evaluate to True in order to include the row.
Filter with More than One Condition (OR - |)
A common data challenge is extracting a value only if it is in a collection of values. Instead of creating multiple OR statements, we can invoke the .isin() method to extract rows from a DataFrame where a column value exists in a predefined collection such as a Python list.
The .isin() Method
Call the .isnull() and .notnull() methods to create Boolean Series for extracting rows will null or non-null values. Both methods return a Boolean Series object, which can be passed within square brackets after the DataFrame to filter it.
The .isnull() and .notnull() Methods
Call the .between() method to extract rows where a column value falls in between a predefined range. This is another method that return a Boolean Series object, which can be passed within square brackets after the DataFrame to filter it.
The .between() Method
Call the .duplicated() method to create a Boolean Series and use it to extract rows that have duplicate values. This is another example of a method that returns a Boolean Series object, which can be passed within square brackets after the DataFrame to filter it.
The .duplicated() Method
An alternative option to identifying duplicate rows and removing them through filtering is the .drop_duplicates() method. In this lesson, we'll invoke the method to remove rows with duplicate values in a DataFrame. We'll also provide custom arguments to modify how the method operates.
The .drop_duplicates() Method
Call the .unique() and .nunique() methods on a Series to extract the unique values and a count of the unique values. These methods are one letter apart but return completely different results. In addition, the .nunique() requires an additional argument to include null values in its count.
The .unique() and .nunique() Methods
Create the Jupyter Notebook for this third DataFrame-focused module. These lessons cover how to:
set and reset an index in a DataFrameretrieve rows by index position or index labelset new values for one or more cells in the DataFramerename or delete rows or columnscreate a random sample of rows / column
Intro to the DataFrames III Module + Import Dataset
The standard procedure in pandas is to add a numeric index starting at 0. In this lesson, we'll call the .set_index() and .reset_index() methods to alter the index of a DataFrame.
The .set_index() and .reset_index() Methods
One or more rows can be extracted from a DataFrame based on index position or index labels. In this lesson, we'll use the .loc method to retrieve rows based on index label.
Retrieve Rows by Index Label with .loc
One or more rows can be extracted from a DataFrame based on index position or index labels. In this lesson, we'll use the .iloc method to retrieve rows based on index position.
Retrieve Rows by Index Position with .iloc
Use the .ix method to retrieve DataFrame rows based on either index label or index position. This is a catch-all method that combines the best features of the .loc and .iloc methods.
The Catch-All .ix Method
The .loc, .iloc, and .ix methods can take second arguments to specify the column(s) that should be extracted. In this lesson, we'll practice extracting movies from our dataset with this syntax.
Second Arguments to .loc, .iloc, and .ix Methods
In this lesson, we'll discuss how to assign a new value to one cell in a DataFrame. We first extract the cell value by using the .ix method with a row and column argument, then reset its value with the assignment operator (=).
Set New Values for a Specific Cell or Row
We can assign a new value to multiple cells in a DataFrame. In this lesson, we'll use the .ix method to extract a subset from a DataFrame, then reassign all column values in that subset.
Set Multiple Values in DataFrame
In this lesson, we'll call the .rename() method on a DataFrame to change the names of the index labels or column names. The method takes an argument of a Python dictionary where the key represents the current column name and the value represents the new column name. We'll also discuss an alternative syntax (the .columns attribute) for changing the column names.
Rename Index Labels or Columns in a DataFrame
Practice three different syntactical options to delete rows or columns from a DataFrame. They include the .drop() method, the .pop() method, and Python's built in del keyword.
Delete Rows or Columns from a DataFrame
In this lesson, we'll call the .sample() method to pull out a random sample of rows or columns from a DataFrame. We'll specify the number of values to include by modifying the n parameter.
Create Random Sample with the .sample() Method
There is a shortcut available to pull out the rows with the smallest or largest values in a column. Instead of sorting the rows and using the .head() method, we can call the .nsmallest() and .nlargest() methods. We'll dive into these methods and their parameters in this lesson.
The .nsmallest() and .nlargest() Methods
Sometimes, you'll want to retain the structure of the original DataFrame when you extract a subset. In this lesson, we'll call the .where() method to return a modified DataFrame that holds NaN values for all rows that don't match our provided condition.
Filtering with the .where() Method
Our filtration process so far has involved using official pandas syntax. In this lesson, I'll introduce the .query() method, an alternate string-based syntax for extracting a subset from a DataFrame.
The .query() Method
In this review of a lesson from our Series Module, we'll call the .apply() method on a Series to apply a Python function on every value within it. This will act as a foundation for the next lesson, where we'll invoke the same method on a DataFrame.
A Review of the .apply() Method on Single Columns
The .apply() method applies a Python function on a row-by-row basis in a DataFrame. In this example, we'll create a custom ranking function for our films, then demonstrate how it can be applied to a DataFrame.
The .apply() Method with Row Values
The default bracket syntax extracts a component of the larger DataFrame. Any operations on that component will affect the larger DataFrame. If we want to separate the two objects, we can use the .copy() method, which create an independent copy of a pandas object.
The .copy() Method
Working with Text Data
Datasets can arrive with plenty of poorly formatted data. The Working with Text Data module introduces the string methods available in pandas to clean your data. In this introductory lesson, we'll create the Jupyter Notebook for this module and import a CSV file with public data on Chicago employees. We'll also optimize the DataFrame for speed and efficiency.
Intro to the Working with Text Data Module
String methods in pandas require a .str prefix to operate properly. In this lesson, we'll explore four popular string methods:str.lower() to convert a string's characters to lowercasestr.upper() to convert a string's characters to uppercasestr.title() to capitalize the first letter of every word in a stringstr.len() to return a count of the number of characters in a string
Common String Methods - lower, upper, title, and len
The .str.replace() method replaces a substring within a string with another value that the user provides. In this lesson, we'll practice calling the method on a Series of string values. We'll use the method to convert our Employee Annual Salary column to a proper numeric column.
The .str.replace() Method
In this lesson, we'll introduce the .str.contains(), .str.startswith(), and .str.endswith() methods. All three create a Boolean Series, which can be used to extracting rows from a DataFrame. We'll also discuss case normalization to increase the accuracy of our results.
Filtering with String Methods
In this lesson, we'll invoke the .str.strip() family of methods to remove leading and trailing whitespace from strings in a Series. The .str.lstrip() method removes whitespace from the left side (beginning) of a string, the .str.strip() method removes whitespace from the right side (end) of a string, and the .str.strip() method does both.
More String Methods - strip, lstrip, and rstrip
The past few lessons focused on calling string methods on the values in a column of our dataset. In this lesson, we'll familiarize ourselves with calling the same string methods on the index labels and column names of a DataFrame.
String Methods on Index and Columns
Strings can often contain multiple pieces of information that are separated by a common delimiter. In this lesson, we'll introduce the .str.split() method, which can split a string value based on an occurrence of a user-specified value. This is equivalent to the Text to Columns feature in Microsoft Excel.
Split Strings by Characters with .str.split() Method
In this lesson, we'll utilize additional parameters on the .str.split() method to modify its performance. We'll extract the first names of all the employees in our dataset, a slightly more challenging puzzle than the one in the previous lesson.
More Practice with Splits
In this lesson, we'll explore even more parameters on the .str.split() method. The expand parameter allows us to expand the generated Python list into DataFrame columns while the n parameter limits the total number of splits.
The expand and n Parameters of the .str.split() Method
The index of a pandas object can include multiple levels or layers. The object that stores this index is called a MultiIndex. In this lesson, we'll create a Jupyter Notebook for this module and explore our dataset.
Intro to the MultiIndex Module
In this lesson, we'll create a multi-layer MultiIndex on a DataFrame with the .set_index() method. The method can be passed a list instead of a string to transfer multiple columns to the index.
Create a MultiIndex with the set_index() Method
The .index attribute will return the object that makes up the index of a DataFrame. In this lesson, we'll combine this attribute with the .get_level_values() to extract the values from one of its layers.
The .get_level_values() Method
The levels or layers of a MultiIndex can be changed. In this lesson, we'll call the .set_names() method on a MultiIndex object to rename its levels.
The .set_names() Method
In this lesson, we'll explore how the .sort_index() operates on a MultiIndex DataFrame. We'll provide a list of arguments to the ascending parameter to modify how each level is sorted.
The sort_index() Method
In this lesson, we'll review the familiar .loc and .ix methods for extracting rows from a MultiIndex DataFrame. This time around, we'll feed a tuple argument to specify a value to search for in every layer of the MultiIndex.
Extract Rows from a MultiIndex DataFrame
In this lesson, we'll call the .transpose() method on a MultiIndex DataFrame to swap its row and column axes. This is a convenience method that avoids having to reset and set each index manually.
The .transpose() Method and MultiIndex on Column Level
The .swaplevel() method swaps two levels within a MultiIndex. In this lesson, we'll practice this method with our bigmac dataset. If the MultiIndex consists of only two levels, no additional arguments are required
The .swaplevel() Method
The .stack() method stacks an index from the column axis to the row axis. It essentially transfers the columns to the row index. In this lesson, we'll see a live example on our bigmac dataset.
The .stack() Method
The .unstack() method does the exact opposite of the .stack() method. It moves an index level from the rows to the columns. In this lesson, we'll call the method without any arguments.
The .unstack() Method, Part 1
In this lesson, we'll continue our exploration of the .unstack() method. We'll introduce the numerous argument types we can feed it as arguments including positive integers, negative integers, and index level names.
The .unstack() Method, Part 2
Multiple levels of the row-based MultiIndex can be shifted with the .unstack() method. In this lesson, we'll explore how to provide a list argument to the level parameter to move multiple layers at a time. We'll also introduce the fill_value parameter to plug in missing values in the resulting DataFrame.
The .unstack() Method, Part 3
In this lesson, we'll reorganize the unique values in a DataFrame column as the column headers with the .pivot() method. This can be a particularly effective method for shortening the length of the DataFrame.
The .pivot() Method
In this lesson, we'll emulate Excel's Pivot Table functionality with the .pivot_table() method. We'll explore the values, index, column, and aggfunc parameters. We'll also discuss the variety of aggregation functions that we can use including sum, count, max, and min.
The .pivot_table() Method
The pd.melt() can effectively perform anti-pivot operations. In this lesson, we'll call the method on a DataFrame to convert its current data structure into a more tabular format. We'll also explore the optional parameters available to modify the resulting column names in the new DataFrame.
The pd.melt() Method
The pandas DataFrameGroupBy object allows us to create groupings of data based on common values in one or more DataFrame columns. In this lesson, we'll setup a new Jupyter Notebook in preparation for this module.
Intro to the Groupby Module
The GroupBy object does not offer us much of substance until we call a method on it. In this lesson, we'll call the .first(), .last(), and .size() methods on a GroupBy object to gain a better understanding of its internal data structure.
First Operations with groupby Object
The .get_group() method extracts a grouping from a GroupBy object. In this lesson, we'll practice pulling out a few groups from our companies dataset.
Retrieve A Group with the .get_group() Method
Aggregation methods allow us to perform calculations on all groupings within a GroupBy object. In this lesson, we'll call some mathematical methods on the groups, including the .sum(), .mean(), and .max() methods.
Methods on the Groupby Object and DataFrame Columns
A GroupBy object does not have to be made up of values from a single column. In this lesson, we'll create a new GroupBy object based on unique value combinations from two of our DataFame columns.
Grouping by Multiple Columns
Certain situations may require different aggregation methods on different columns within our groupings. In this lesson, we'll invoke the .agg() method on our GroupBy object to apply a different aggregation operation to each inner column.
The .agg() Method
A standard Python for loop can be used to iterate over the groups in a pandas GroupBy object. In this lesson, we'll loop over all of our gropings to extract selected rows from each inner DataFrame. We'll append these rows to a running DataFrame and then view the final result.
Iterating through Groups
Merging, Joining, and Concatenating
The Merging, Joining, and Concatenating module focuses on combining data from multiple DataFrames into one. In this introductory lesson, we'll set up a new Jupyter Notebook for this module and import the CSV files that we will use.
Intro to the Merging, Joining, and Concatenating Module
The pd.concat() method is used to concatenate two or more DataFrames together. The process is simple when the DataFrames have an identical structure. In this lesson, we'll also explore how to overwrite the concatenated index with a new one.
The pd.concat() Method, Part 1
In this lesson, we'll use the keys parameter on the pd.concat() method to identify what DataFrame the rows came from. This will create a MultiIndex DataFrame where the most outer layer will hold the keys we pass as identifiers for each DataFrame.
The pd.concat() Method, Part 2
In this lesson, we'll call the .append() method on a DataFrame to concatenate another DataFrame to the end. This is an alternative syntax to the .concat() method which is called directly on the pandas library.
The .append() Method on a DataFrame
An inner join merges the values in two DataFrames based on common values across one or more columns. In this lesson, we'll explore the concept by merging on identical values in a single column.
Inner Joins, Part 1
This lesson continues our exploration of the .merge() method. This time, we'll merge the values in two DataFrames based on common values in multiple columns. We'll also validate the data with some filtering.
Inner Joins, Part 2
An outer join combines values that exist in either DataFrame into a central DataFrame. In this lesson, we'll invoke the .merge() method with a modified argument to the how parameter to perform an outer join on our weekly sales data sets.
A left join establishes one of the DataFrames as the base dataset for the merge. It attempts to find each value in another DataFrame and drag over that DataFrame's rows when there's a value match. In this lesson, we'll practice executing this join with the .merge() method.
DataFrames may come equipped with different names for columns that represent the same data. In this lesson, we'll talk about how to utilize the left_on and right_on parameters to specify how to match values in differently named columns across two DataFrames.
The left_on and right_on Parameters
Our merges so far have involved matches based on common column values. In this lesson, we'll explore how to merge DataFrames based on common index labels.
Merging by Indexes with the left_index and right_index Parameters
Call the .join() method, a simple method to concatenate two DataFrames vertically when they share the same index. This is a shortcut to a more explicit .merge() method.
The .join() Method
Call the pd.merge() method on the pandas library to merge two DataFrames. This is an alternate syntax to calling the .merge() method directly on a DataFrame.
The pd.merge() Method
Working with Dates and Times
The Working with Dates and Times module offers a review of Python's built-in objects for working with dates and times as well as a comprehensive introduction to similar tools in the pandas library. In this lesson, we'll create our Jupyter Notebook for this module and import Python's datetime module.
Intro to the Working with Dates and Times Module
Python includes built-in date and datetime objects for working with dates and times. This lesson offers a review of how we can create these objects as well as some of the attributes (.year, .month, .day etc) that are available on them.
Review of Python's datetime Module
The pandas library includes its own Timestamp object to represent moments in time. In this lesson, we'll use the pd.Timestamp() constructor method with a variety of inputs (strings, date objects, date objects) to create some Timestamp objects.
The pandas Timestamp Object
A DatetimeIndex is a pandas object for storing multiple Timestamp objects. In this lesson, we'll create a few DatetimeIndex objects from Python lists.
The pandas DateTimeIndex Object
The pd.to_datetime() method is a convenience method to convert various inputs to pandas-focused objects. In this lesson, we'll pass a variety of inputs (date objects, datetime objects, strings, lists) to the constructor method to see what it returns.
The pd.to_datetime() Method
Over the course of the next three lessons, we'll call the pd.date_range() method to generate a DatetimeIndex of Timestamp objects. This constructor method includes 3 critical parameters (start, end, and periods); we need to provide 2 of these 3 for it to function. In this lesson, we'll see how the pd.date_range() method operates with arguments for the start and end parameters.
Create Range of Dates with the pd.date_range() Method, Part 1
In this lesson, we'll see how the pd.date_range() method operates with arguments for the start and periods parameters. This approach creates a set number of dates beginning from a specific point.
Create Range of Dates with the pd.date_range() Method, Part 2
In this lesson, we'll see how the pd.date_range() method operates with arguments for the end and periods parameters. This approach creates a set number of dates, proceeding backwards from a specified date point. We'll also continue our exploration of the freq parameter to vary the durations between each Timestamp.
Create Range of Dates with the pd.date_range() Method, Part 3
The .dt accessor on a Series of Timestamp object allows us to access specific datetime properties, much like the .str accessor allows us to call specific methods on a Series of strings. In this lesson, we'll explore popular attributes like .day, .weekday_name, and .month.
The .dt Accessor
This module and future modules in the course rely on the pandas-datareader library to fetch financial datasets from Google Finance. In this lesson, we'll use the Terminal (Mac OS) to install the pandas-datareader library.
On Mac, the commands are:
source activate rootconda install pandas-datareader
If you're following a long on a Windows machine, open the Command Prompt and execute the following commands:
activate rootconda install pandas-datareader
Install pandas-datareader Library
In this lesson, we'll import our pandas_datareader library and fetch a financial dataset from Google Finance. This is a real-life example of a dataset with a DatetimeIndex.
Import Financial Data Set with pandas_datareader Library
The process for extracting rows from a DataFrame with a DatetimeIndex is no different than in previous modules. In this lesson, we'll review the familiar .loc, .iloc, and .ix methods. As a reminder, these methods conclude with square brackets, not parentheses.
Selecting Rows from a DataFrame with a DateTimeIndex
In this lesson, we'll access some of the datetime attributes available on a pandas Timestamp object. We'll also use the .insert() method to add new columns at the beginning of our DataFrame.
Timestamp Object Attributes
The .truncate() method is a convenience method for slicing operations on objects with a DatetimeIndex. It includes two parameters -- before and after - to specify the start and end of our date range. In this lesson, we'll practice calling the method on our financial dataset.
The .truncate() Method
In this lesson, we'll use the pd.DateOffset object to add hours, days, weeks, months, and years to a DatetimeIndex. This is a powerful but slightly hidden feature of the pandas library.
Some of the DateOffset objects are hidden in a semi-secret location in the pandas library - pandas.tseries.offsets. In this lesson, we'll import these objects into our namespace to perform more specific time calculations on our DataFrames.
More Fun with pd.DateOffset Objects
Over the next two lessons, we'll explore the pandas Timedelta object which represents durations. A Timedelta represents a distance of time while a Timestamp represents a specific moment in time.
The pandas Timedelta Object
In this lesson, we'll create a Series of Timedelta objects by calculating the duration differences between two columns of Timestamps. Time difference operations can be easily performed with the subtraction ( - ) sign.
Timedeltas in a Dataset
5 More Sections