Who played the most total number of days of Survivor? If a contestant appeared on more than one season, you would add their total days for each season together. Save the top five contestants in terms of total days played as a DataFrame and call it Q7, sorted in descending order by total days played.

The following columns should be included: castaway_id, full_name, and total_days_played where total_days_played is the sum of all days a contestant played. The index should go from 0 to n-1.

Note: Be careful because on some seasons, the contestant was allowed to come back into the game after being voted off. Take a look at Season 23's contestant Oscar Lusth in the castaways DataFrame as an example. He was voted out 7th and then returned to the game. He was then voted out 9th and returned to the game a second time. He was then voted out 17th the final time. Be aware of this in your calculations and make sure you are counting the days according to the last time they were voted off or won.

1 answer

To solve this task, we need to calculate the total number of days each contestant played across different seasons. We will do this by grouping the episodes table by castaway_id and summing up the days played for each castaway. Then, we will join this information with the castaways table to get the full name of each contestant. Finally, we will sort the resulting DataFrame in descending order by the total days played and select the top five contestants.

The code below implements these steps:

```python
# Group episodes by castaway_id and sum the days played for each castaway
total_days = episodes.groupby('castaway_id')['total_days_played'].sum()

# Join with castaways table to get the full name of each contestant
Q7 = pd.merge(castaways[['castaway_id', 'full_name']], total_days, on='castaway_id')

# Sort in descending order by total days played and select top five contestants
Q7 = Q7.sort_values('total_days_played', ascending=False).reset_index(drop=True).head(5)
```

The resulting DataFrame Q7 will have the following columns: castaway_id, full_name, and total_days_played. The index will go from 0 to n-1, where n is the number of rows in the DataFrame.