Data Aggrigation Project using pandas: Clean Summer dataset according to Sport Experts and Aggregate the results

### Project Overview **Course:** "The Complete Pandas Bootcamp 2023 - Data Science with Python" **Platform:** Udemy **Instructor:** Alexander Hagmann **Project Title:** Summer Olympic Games Medal Tables Aggregation ### Introduction This project tackles a data aggregation challenge commonly encountered in job applications and assessment centers within the Data Science field. The primary task involves manipulating and interpreting a vast dataset to generate the Medal Tables for the Summer Olympic Games spanning from 1896 to 2012. ### Project Goals and Objectives Upon joining a Data Science advisory firm, your first assignment is to recreate the official Medal Tables for all editions of the Summer Olympic Games. This entails utilizing datasets like `summer.csv`, which includes over 31,000 medal entries, and aligning your results with the official Medal Tables from the 1996 and 1976 Olympics, extracted from Wikipedia (`wik_1996.csv`, `wik_1976.csv`). **Challenge:** Aim to minimize the total absolute divergence between your aggregated Medal Tables and the official ones, with the goal of achieving an optimal score of 0. For example, if the official Gold Medal count for the United States in 1996 is 44, and your calculation gives 46, this results in an absolute divergence of 2. ### Key Insights - **Team and Singles Events:** In team events, a medal won by any team counts as a single medal irrespective of the number of team members. In singles events, each awarded medal counts individually, even when medals are shared. - **Event Categories:** Medals are differentiated into Men's, Women's, and Mixed Events. Specific criteria determine Mixed Events, including all "Equestrian" and "Sailing" events before 1988, as well as certain medals in Badminton mixed doubles. ### Valuable Perspectives Incorporating insights from sports experts is crucial, particularly in understanding how medals are structured and distributed across different event types. This requires deep data analysis and strategic thinking about data structures. ### Conclusion This project tests both coding proficiency and the ability to integrate expert knowledge and data interpretation to solve complex challenges in Data Science. The emphasis on "Thinking in Data Structures" is a key skill for any aspiring data scientist.

Our Sidebar

You can put any information here you'd like.

  • Latest Posts
  • Announcements
  • Calendars
  • etc