Yucatan Peninsula Hurricanes

Week 4 - Discussion section

In this discussion section you will wrangle historical data about hurricanes at the Yucatan Peninsula in Mexico. In this discussion section, you will:

Practice breaking down a question into accessible data wrangling steps
Choose appropriate packages and methods to carry out your analysis
Practice finding additional guidance online to carry out your data wrangling plans

Setup

Access the workbench-1 server.
Navigate to theeds-220-sections directory in the file navigation panel and the terminal.
Create a new Python notebook inside your eds-220-sections directory and rename it to section-4-yucatan-hurricanes.ipynb.
Use the terminal to push this file to your remote repository.

General directions

Add comments as appropriate along your code following the course commenting standards.
Include markdown cells in between your code cells to add titles and information to each exercise
Commit every time you finish a major step. Remember to write your commits in the imperative mood.

About the data

In this discussion section we will use historical data about hurricanes in the Yucatan Peninsula [1].

The Yucatan Peninsula, located in southeastern Mexico and bordered by the Gulf of Mexico and the Caribbean Sea, is a region highly susceptible to hurricanes due to its geographic location. These intense storms bring strong winds, heavy rainfall, and storm surges that significantly impact both terrestrial and coastal ecosystems. Hurricanes can cause widespread deforestation, alter habitats, and disrupt biodiversity, affecting, among other habitats, mangroves and coral reefs.

Hurricane Mitch (Category 5 on the Saffir-Simpson Hurricane Wind Scale) impacted the Yucatan Peninsula in late October 1998, with its effects felt most strongly around October 28-29. The Yucatan experienced heavy rains and flooding as Mitch passed near the region. Image source: NOAA

This dataset includes information about the Saffir-Simpson hurricane category for each hurricane. The Saffir-Simpson scale is a widely used classification system that categorizes hurricanes based on their sustained wind speeds and potential for damage. Ranging from Category 1 to Category 5, the scale assesses the intensity of hurricanes, with Category 1 being the least severe and Category 5 representing catastrophic storms. Categories 3 and above are considered major hurricanes, capable of causing significant structural damage, flooding, and long-lasting environmental impacts.

1. Archive exploration

Take some time to mindfully look through the dataset’s description and metadata.
In your notebook: use a markdown cell to add a brief description of the dataset, including a citation, date of access, and a link to the archive.

2. Data loading and preliminary exploration

We will be using the hf071-01-hurricanes.csv file. Agree with your team on how you will import this file to your notebook and store it in a variable named df.

CHECK IN WITH YOUR TEAM

MAKE SURE YOU’VE ALL SUCCESSFULLY ACCESSED THE DATA BEFORE CONTINUING

Obtain preliminary information and explore this data frame using pandas methods.

3. Brainstorm

In this session we want to answer the following question:

How many hurricanes with Saffir-Simpson category 5 have been registered and what was their duration?

Individually, write down step-by-step instructions on how you would wrangle the df data frame to answer the question. Do not code anything yet. Remember: It’s okay if you don’t know how to code each step. The important thing is to have an idea of what you’d like to do.
Discuss your step-by-step instructions with your team. What do you see as potential challenges to implementing your plan?
As a team, select an initial data wrangling plan for answering the question.

4. Data wrangling

Use your plan as a starting point to answer the question.

You may (or not) need to look online to carry out some of the steps in your plan. It is completely fine to seek help online! Resourceful troubleshooting is a key skill in data science.
It’s ok if your initial plan changes as you work with the data and discuss challenges with your team! This brainstorm is to create a shared starting point.

5. Collect your code and explain your results

In a new code cell, collect all the relevant code to create a streamlined workflow to obtain the final data to answer the question. Your code cell should:

Only print the final results.
Not include output from intermediate variables or checks.
Not include methods or functions that do not directly contribute to the analysis (even if they don’t print anything ex: df.head()).
If appropriate, combine methods using code chaining instead of creating intermediate variables.
Comment your code following our class comments guidelines.
Use appropriate line breaks and indentation to make code readable.

Write a full sentence explaining your answer to the question in (3). Don’t forget to include units. You may also want to include any insights about the rest of the data for the Category 5 hurricanes.

BONUS: Visualize Saffir-Simpson categories across time

Create a scatter plot of the start date of the hurricanes against the Saffir-Simpson scale. Use matplotlib to customize your graph, including updating the tick labels to be only 1, 2, 3, 4, and 5 since the Saffir-Simpsn scale does not take decimal values.
Analyze your plot and write (in full sentences!) any trends that you observe.

Don’t forget to commit, pull, and push.

References

[1]

E. Boose and D. Foster, “Ecological Impacts of Hurricanes Across the Yucatan Peninsula 1851-2000.” Environmental Data Initiative, 2023. doi: 10.6073/PASTA/F219113373913F2DAF421732E28D3C38. Available: https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-hfr.71.23. [Accessed: Oct. 23, 2024]