Assignment 1

This assignment covers topics in the notes from the Python review to the plotting with pandas lesson. Task 1 will contribute 20% to the total grade of the assignment and tasks 2 and 3 will contribute 40% each.

Submission instructions

This assignment is due by 11:59 pm on Saturday, September 11. All tasks for this assignment should be submitted via Gradescope. Make sure you double-check your submission to ensure it satisfies all the items in this checklist:

Resubmissions after the due date due to not satisfying one of the checks above will be strictly held to the course’s 50%-regrade resubmission policy (see syllabus).

If you have any questions about assignment logistics, please reach out to the instructional team by 5 pm Thursday, September 9th.

Rename homework notebooks before uploading them to Gradescope

For your upcoming assignment submission, you’ll be downloading your notebooks and then uploading them to Gradescope. Before you upload your finished notebooks to Gradescope, please rename your notebooks so they are called

  • hwk1-task2-corals-YOURLASTNAME.ipynb and
  • hwk1-task3-earthquakes-YOURLASTNAME.ipynb.

It’s important to do this so we can keep track of resubmissions.

Thanks!

Otter Grader Checks

These notebooks utilize the otter library and have self contained checks within the notebooks to ensure you are on the right track. Follow the steps below to run the otter grader checks.

  1. Run the very first cell of the notebook that contains #Initialize Otter.
  2. Fill in code where the notebook instructs you to do so.
  3. When you encounter a gracer.check() cell, like the image below, run the cell. Doing so will either return an emoji with the text “q# passed!”, or it will tell you the test did not pass. If your test did not pass, try rereading the prompt and making sure your output is what you would expect. If you can’t figure out where the issue is, discuss it with other people (first option always!), use Slack, or come see Annie or Carmen during student hours.

Screenshot 2025-09-30 at 2 16 07 PM

  1. The very last cell of the notebook contains a cell with grader_check_all(). This runs all tests in the notebook and reports the tests that have passed as well as the ones that have failed. If the output of this cell looks like the image below (question numbers may differ), it means you passed all the autograder checks!

Screenshot 2025-09-30 at 1 15 25 PM

Reminders:

  • Make sure you’re keeping up with your classmate’s questions and answers on Slack.
  • When submitting your final notebook, follow the instructions above regarding how to name the notebook.

AI Policy

If you use generative AI on this assignment, you are expected to adhere to the following course policies:

  • ✅ Cultivate understanding: You should be able to fully understand, justify, and explain all the work you submit.
  • 🤔 Question AI outputs: The default should be to assume the answers you get from generative AI are incorrect and you must verify any information the platform generates.
  • 🚫 Academic integrity: Submitting work you don’t understand or can’t explain or justify will be considered plagiarism, regardless of whether you have disclosed the use of generative AI or not.
  • 📄 Document any AI use: If you do end up using generative AI in your work, you will need to complete and submit the Generative AI Use Documentation form and upload it to the “Generate AI Documentation” portal on Gradescope.

If there are concerns about AI use in your work, your instructor will ask you to meet and talk it through. If understanding is clearly lacking and this is the first time this happens, you’ll have the chance to revise and resubmit your work for 50% of the original maximum grade within two days.

Task 1: Datasheets for Datasets reading

So much goes into creating a dataset, and data is more than numbers and words in a file. Without a proper understanding of the whole context where data was created, biases, omissions, and inacuracies can go undetected. The Datasheets for Datasets [1] framework advocates for transparency about the purpose and contents of datasets.

Check out this short interview with lead author Dr. Timnit Gebru, the executive director of the Distributed Artificial Intelligence Research Institute (DAIR), on the motivation to write this article:

Read the paper and write a one-paragraph (between 100 and 150 words) reflection about it. Review the rubric for this assignment here. Answer at least one of the following questions for your reflection:

  1. Can you think of a dataset you have worked with or encountered in your studies that would have benefited from a datasheet? Explain why or why not, using specific details about the dataset’s context, collection methods, or biases.

  2. What do you think are the limitations of the datasheets framework? Are there any challenges or risks associated with this approach, and how might they be addressed in practical settings?

  3. How does the topic of transparency in datasets relate to your understanding of ethical data science practices? Provide an example where increased transparency could have changed the outcome of a dataset you have used or read about.

  4. Based on your previous professional experience, if you were tasked with creating a dataset for a project, what challenges or decisions would you face when creating its datasheet? Reflect on one or two aspects of data collection or transparency that you feel are particularly important.

Ready to submit your answer? Make sure your submission follows the checklist at the top of the assginment!

Setup for tasks 2 and 3

  1. Fork this repository: https://github.com/MEDS-eds-220/eds220-hwk1

  2. In the workbench-1 server, start a new JupyterLab session or access an active one.

  3. Using the terminal, clone your eds220-hwk1 repository into your eds-220 directory.

  4. In the terminal, use cd to navigate into the eds-220-hwk1 directory. Use pwd to verify eds-220-hwk1 is your current working directory.

Task 2: Exploring coral diversity data

For this task we are going to use data about Western Indian Ocean Coral Diversity [2] stored in the the Knowledge Network for Biocomplexity (KNB) data repository. The author for this dataset is Dr. Tim McClanahan, senior conservation zoologist at Wildlife Conservation Society.

Dr. Tim McClanahan underwater surveying coral reefs in coastal Tanzania. Photo credit: ©Michael Markovina. From the online article How Mount Kilimanjaro and We Can Save Corals

Follow the instructions in the notebook hwk1-task2-corals.ipynb to complete this task. Review the rubric for this assignment here. In this task you will practice:

  • preliminary data exploration
  • accessing data using a URL from a data archive
  • selecting data from a data frame
  • basic git workflow
  • commenting your code

Ready to submit your answers? Make sure your submission follows the checklist at the top of the assginment!

Task 3: pandas fundamentals with earthquake data

This task is adapted from the Pandas Fundamentals with Earthquake Data assignment from the e-book Earth and Environmental Data Science [3].

You will use simplified data from the USGS Earthquakes Database.

Follow the instructions in the notebook hwk1-task3-earthquakes.ipynb to complete this task.Review the rubric for this assignment here. Here you will practice:

  • accessing data from your directory
  • selecting data from a data frame
  • creating exploratory graphs
  • basic git workflow
  • commenting your code

Ready to submit your answers? Make sure your submission follows the checklist at the top of the assginment!

References

[1]
T. Gebru et al., “Datasheets for datasets,” Commun. ACM, vol. 64, no. 12, pp. 86–92, Nov. 2021, doi: 10.1145/3458723. Available: https://doi.org/10.1145/3458723
[2]
T. McClanahan, “Western Indian Ocean Coral Diversity.” KNB Data Repository, 2023. doi: 10.5063/F1K35S3H. Available: https://knb.ecoinformatics.org/view/doi:10.5063/F1K35S3H. [Accessed: Sep. 16, 2024]
[3]
R. Abernathey, K. Key, T. Crone, and J. Busecke, An Introduction to Earth and Environmental Data Science#. Available: https://earth-env-data-science.github.io/intro.html. [Accessed: Sep. 16, 2024]