
Exploratory Data Anasysis Project using pandas: Summer Olympics + Winter Olympics + Pupulation + GDP Olympics
This project is based on "The Complete Pandas Bootcamp 2023 - Data Science with Python," a course offered by Udemy and taught by Alexander Hagmann. This project is focusing on Exploratory Data analysis (EDA). EDA is a crucial step in the data analysis process that involves summarizing, visualizing, and understanding the main characteristics and patterns within a dataset. The primary objectives of EDA in this project: o Data Inspection: Import the Datasets Summer (summer.csv), Winter (winter.csv) and dictionary (dictionary.csv) and Inspect! o Merge and Concatenate: 1. Merge Summer and Winter (one row for each Medal awarded in any Olympic Games) and save the merged DataFrame in olympics. 2. An additional column (e.g. "Edition") shall indicate the Edition -> Summer or Winter. 3. Add the full Country name from the dictionary to olympics (e.g. France for FRA). o Data Cleaning: 1. Remove Spaces from column headers in dictionary. 2. For some Country Codes, there is no corresponding full Country Name available (e.g. for "URS") -> missing values in olympics. Identify these Country Codes and search the Web for the full Country Names. Replace missing values in Country column! 3. Remove rows from olympics where the Country code is unknown. (Make sure you reset the Index -> RangeIndex) 4. Convert the column Medal into an ordered Categorical column ("Bronze" < "Silver" < "Gold") o Exploratory Data Analysis: • Do GDP, Population, and Politics matter?: 1. Create the following aggregated and merged DataFrame with the Top 50 Countries. The Column Total_Games shows the number of Participants (as an approximation: determine the number of Editions where Countries have won at least one medal). 2. Convert the absolute values in the DataFrame into ranks and save the ranks DataFrame in new variable. • Statistical Analysis and Hypothesis Testing with scipy: In the following work with Ranks! Check whether GDP (Standard of Living), Total_Games (Political Stability measure), and Population (Size) have an effect on Total Medals. Work with Spearman correlation, not with Pearson correlation. In this part, we are going to test whether the factors of population, GDP per capita and the number of participants influence and determine a country's success in the Olympic Games with statistical significance. • Medals Heatmap by Gender and Edition: Create the following Seaborn Heatmap with Medal Ranks for Top 50 Countries (Total Medals, Summer Games Medals, Winter Games Medals, Men, Women). • Summer Games vs. Winter Games - does Geographical Location matter?: Identify Countries that are equally successful in Summer and Winter Games, more successful in Summer Games, more successful in Winter Games. What could be the reasons? 1. First, let's compare summer athletes to winter athletes in the same country, who got more medals. 2. Second, let's compare athletes in summer games in one country to athletes in summer games in another country. • Men vs. Women - does Culture & Religion matter? : Identify Countries where Men and Women are equally successful. Men are more successful. Women are more successful. What could be the reasons? 1. First, compare men to women in the same country and who got more medals. 2. Second, we can compare the men in one country to the men in another country. Then, we can compare the women in one country to the women in another country. • Do Traditions matter?: Create the following Seaborn Heatmap that shows the Ranks of Top 50 Countries by Sports. Identify traditional Sports / National Sports for e.g. UK and China!