Assignment 2

This assignment covers topics in the notes up to the time series lesson. Task 1 will contribute 20% to the total grade of the assignment and tasks 2 and 3 will contribute 40% each.

Submission instructions

This assignment is due by 11:59 pm on Saturday, October 25. All tasks for this assignment should be submitted via Gradescope. Make sure you double-check your submission to ensure it satisfies all the items in these checklists:

  • File formatting and uploading:
  • Notebook content checklists:

Resubmissions after the due date due to not satisfying one of the checks above will be strictly held to the course’s 50%-regrade resubmission policy (see syllabus).

If you have any questions about assignment logistics, please reach out to the instructional team by 5 pm Friday, October 24.

Rename homework notebooks before uploading them to Gradescope

For your upcoming assignment submission, you’ll be downloading your notebooks and then uploading them to Gradescope. Before you upload your finished notebooks to Gradescope, please rename your notebooks so they are called

  • hwk2-task2-salmon-YOURLASTNAME.ipynb and
  • hwk2-task3-aqi-YOURLASTNAME.ipynb.

It’s important to do this so we can keep track of resubmissions.

Thanks!

Otter Grader Checks

These notebooks utilize the otter library and have self contained checks within the notebooks to ensure you are on the right track. Follow the steps below to run the otter grader checks.

  1. Run the very first cell of the notebook that contains #Initialize Otter.
  2. Fill in code where the notebook instructs you to do so.
  3. When you encounter a gracer.check() cell, like the image below, run the cell. Doing so will either return an emoji with the text “q# passed!”, or it will tell you the test did not pass. If your test did not pass, try rereading the prompt and making sure your output is what you would expect. If you can’t figure out where the issue is, discuss it with other people (first option always!), use Slack, or come see Annie or Carmen during student hours.

Screenshot 2025-09-30 at 2 16 07 PM

  1. The very last cell of the notebook contains a cell with grader_check_all(). This runs all tests in the notebook and reports the tests that have passed as well as the ones that have failed. If the output of this cell looks like the image below (question numbers may differ), it means you passed all the autograder checks!

Screenshot 2025-09-30 at 1 15 25 PM

Reminders:

  • Make sure you’re keeping up with your classmate’s questions and answers on Slack.
  • When submitting your final notebook, follow the instructions above regarding how to name the notebook.

AI Policy

If you use generative AI on this assignment, you are expected to adhere to the following course policies:

  • ✅ Cultivate understanding: You should be able to fully understand, justify, and explain all the work you submit.
  • 🤔 Question AI outputs: The default should be to assume the answers you get from generative AI are incorrect and you must verify any information the platform generates.
  • 🚫 Academic integrity: Submitting work you don’t understand or can’t explain or justify will be considered plagiarism, regardless of whether you have disclosed the use of generative AI or not.
  • 📄 Document any AI use: If you do end up using generative AI in your work, you will need to complete and submit the Generative AI Use Documentation form and upload it to the “Generate AI Documentation” portal on Gradescope.

If there are concerns about AI use in your work, your instructor will ask you to meet and talk it through. If understanding is clearly lacking and this is the first time this happens, you’ll have the chance to revise and resubmit your work for 50% of the original maximum grade within two days.

Task 1: Better Science in Less Time Using Open Data Science Tools reading

Using the right tools for a reproducible, efficient, and shareable workflow can be transformational. The article Our Path to Better Science in Less Time Using Open Data Science Tools [1] recounts how switching to open data science tools made it possible to transform the Ocean Health Index into an updatable and adaptable project. Although the paper focuses on using R, their learnings go well beyond this programming language and apply to anyone seeking to improve the reproducibility of their data analyses.

Want to hear more about paths to open science with accompanyed by beautiful ilustrations? Check out this talk where Dr. Allison Horst and lead author Dr. Julie Lowndes share their personal journey’s towards open science and introduce the Openscapes program!

Read the paper and write a one-paragraph (between 100 and 150 words) reflection about it. Review the rubric for this assignment here. Answer at least one of the following questions for your reflection:

  1. In your previous working experience, have you been working with reproducibility in mind? Which tools have allowed you or prevented you from making your work reproducible?
  2. The paper presents different strategies for learning intentionally. Have you used any of these strategies? Could you adopt some as you progress in your courses and career?
  3. What do the authors see as the role of Git and GitHub in supporting reproducibility, transparency, and communication? Is your experience using these tools similar?

Ready to submit your answer? Make sure your submission follows the checklist at the top of the assginment!

Setup for tasks 2 and 3

  1. Fork this repository: https://github.com/MEDS-eds-220/eds220-hwk2

  2. In the workbench-1 server, start a new JupyterLab session or access an active one.

  3. Using the terminal, clone your fork of the eds220-hwk2 repository into your eds-220 directory.

Task 2: Wrangling Alaska salmon catch data

This exercise is based on the Cleaning and Wrangling Data in R lesson by the NCEAS Learning Hub [2].

In this task you will use simplified data from the Alaska Department of Fish & Game containing commercial salmon catch data from 1878 to 1997 [3]. The original data can be accessed from the KNB repository.

Follow the instructions in the notebook hwk2-task2-salmon.ipynb to complete this task. Review the rubric for this assignment here. In this task you will practice:

  • detecting and wranglig messy data
  • updating column data types
  • obtaining summary statistics by groups
  • creating exploratory plots
  • creating a continuous, polished workflow
  • version control with git following best practices

Ready to submit your answers? Make sure your submission follows the checklist at the top of the assginment!

Task 3: Visualizing AQI during the 2017 Thomas Fire in Santa Barbara County

In this task you will use Air Quality Index (AQI) data from the US Environmental Protection Agency to visualize the impact on the AQI of the 2017 Thomas Fire. The Thomas Fire, which burned across Santa Barbara and Ventura counties in December 2017, has been one of California’s largest wildfires, devastating over 280,000 acres of land, destroying wildlife habitats, and leading to soil erosion and increased flood risks in the region.

Flames from the Thomas Fire burn down the face of the ridge above Highway 101 in the area of Seacliff, Solimar Beach and Faria Beach west of Ventura. Photo credit: ©Ray Ford / Noozhawk photo.

Follow the instructions in the notebook hwk2-task3-aqi.ipynb to complete this task. Review the rubric for this assignment here. In this task you will practice:

  • date and string data wrangling
  • combining multiple data frames
  • visualizing time series
  • creating a continuous, polished workflow
  • version control with git following best practices

Ready to submit your answers? Make sure your submission follows the checklist at the top of the assginment!

References

[1]
J. S. S. Lowndes et al., “Our path to better science in less time using open data science tools,” Nature Ecology & Evolution, vol. 1, no. 6, p. 0160, May 2017, doi: 10.1038/s41559-017-0160. Available: https://www.nature.com/articles/s41559-017-0160. [Accessed: Oct. 18, 2024]
[2]
H. Do-Linh, C. Galaz García, M. B. Jones, and C. Vargas Poulsen, Open Science Synthesis training Week 1. NCEAS Learning Hub & Delta Stewardship Council. 2023. Available: https://learning.nceas.ucsb.edu/2023-06-delta/
[3]
M. Byerly, “Alaska commercial salmon catches by management region (1886- 1997).” 2016. Available: https://knb.ecoinformatics.org/view/df35b.304.2