Project

Goal

The goal of the project is to build something awesome that you are proud to showcase on your personal website.



Project requirements

Team size

You will be working on this project over the course of the semester in teams of 2 or 3. Individual teams are not allowed because one goal of the project is to practice collaboration via GitHub.

Aligning with other classes

You may choose a project topic that is the same or similar to a topic that you’re exploring in another course. This is encouraged! Combining multiple course perspectives can significantly increase the quality of your work.

Data sources

  • You are required to use at least 2 different data sources for this project.
  • I encourage (but am not requiring) you to use one of the new data acquisition tools covered in our course (APIs, web scraping, SQL) for one of these data sources.

Deliverables

You will create a digital artifact that can be hosted on your personal website or linked to from your personal website. Options include:

  • Blog post (single .qmd file)
  • Standalone Shiny app (e.g., dashboard style with an app.R file)
  • Blog post with Shiny features. This is like a blog post where you scroll through text and plots, but with this format, the plots can be interactive (mini Shiny apps). (In RStudio, when you choose File > New File > Quarto Document, you can choose the “Interactive” option to generate this format.)

Presentation

On the last day of class (Thursday, April 25) all teams will present their work. (We will not be using our final exam slot.)

  • Format: “Poster session”
    • Each team will set up a digital display of their work for about 30 minutes and talk to people who come by. We’ll do 3 rounds of 30 minutes with 3-4 teams presenting per round.
    • Why this format? Because we have 10 teams, it would feel too fast-paced to do sequential presentations. I hope that this format feels more low key and celebratory.
    • We’ll have breakfast snacks to celebrate!

Final submission

By the end of the day on Wednesday, May 1:

  • If your artifact has Shiny components (is a standalone Shiny app or hs a blog post with Shiny features), you have the following options:
    • Publish your app on shinyapps.io using the instructions here and share the link on Moodle.
    • Submit on Moodle all files needed for me to view your Shiny app on my computer. This will include the app.R file and any associated data files.
  • If your artifact is a blog post (without Shiny components):
    • Submit all files needed for me to view your post on Moodle. Double check my downloading your Moodle upload that all images are contained in the post.
    • Share on Moodle a link to your final blog post in your team GitHub repository.

All groups should make sure that the final codebase is pushed to your team’s GitHub repository.



Rubric

For each category, there will be a grade of Excellent, Passing, or Needs Improvement.

Category: Thoughtful engagement with data context

Overview of this category:

  • Careful consideration of the who, what, when, where, why, and how of your datasets and how that affects results and interpretation
  • Careful consideration of how the data context relates to ethical considerations of how you investigate your data and how your results should be used/interpreted. It will help to think about the following:
    • Who is affected by the project’s data acquisition and results?
    • What (mis)interpretations or actions might result from the conduct of your investigations or your conclusions?
    • How might negative consequences be mitigated?

Excellent work will:

  • Thoughtfully consider the items above by drawing on team members’ lived experiences AND perspectives from at least 2 other sources (e.g., news articles, research articles, blog posts, press releases, documentation from organizations affiliated with your data). These sources should be referenced in your final digital artifact.

Passing work will:

  • Thoughtfully consider the items above by drawing on team members’ lived experiences

Needs Improvement work will:

  • Attempt to consider the items above but need more thought

Category: Effective data storytelling

The final digital artifact should:

  • Motivate the importance of the topic
  • Lead the reader through the rationale for the narrowing/focusing of the scope via the main 2-3 broad questions
  • Tie results (plots and modeling output) to the broad questions and explain how all results fit together
  • Use sound data visualization principles to most effectively convey meaning
  • End with main takeaways, limitations, and future directions
  • Use clear and concise communication throughout

Excellent work will:

  • Meet all above the above quality expectations

Passing work will:

  • Meet most above the above quality expectations

Needs Improvement work will:

  • Meet some above the above quality expectations

Category: Code quality and documentation

Overview of this category:

  • Code duplication is minimal to nonexistent with the use of functions and loops
  • Use comments appropriately to document what is happening in different parts of code
  • Text space before and after code chunks is used for longer form (paragraph) documentation

Excellent work will:

  • Use functions in all instances where code would have been copied and pasted twice
  • Use functions and loops in all instances where code would have been copied and pasted 3 or more times
  • Consistently use (but not overuse) comments within R code to document code in a way that allows your future selves to remember what was going on. (Not every line of code needs to have a comment, but groups of lines that achieve a particular goal should have a comment.)
  • The text space before and after code chunks explains what is happening in each code chunk (e.g., why particular wrangling steps were performed, the motivation for fitting certain models, why different modeling outputs were extracted).

Passing work will:

  • Often use functions and loops to reduce code duplication but have a small number of missed opportunities
  • Often use code comments and text space before and after code chunks for documentation but have a small number of missed opportunities

Needs Improvement work will:

  • Sometimes use functions and loops to reduce code duplication but have several missed opportunities
  • Sometimes use code comments and text space before and after code chunks for documentation but have several missed opportunities

Category: File organization and version control

Overview of this category:

  • Separate files for cleaning, plotting, and modeling
  • Clean datasets are saved at the end of code files devoted to cleaning and loaded in at the start of code files devoted to plotting and modeling
  • As seen through GitHub commits and file diffs, each team member should contribute roughly equally to the codebase. Note that contributions include both writing new lines of code AND modifying/deleting code (e.g., perhaps to reduce code duplication).
    • Note: I am not going to count the number of lines added/modified by each team member. Rather, I will look at the GitHub commits and file diffs holistically.

Excellent work will:

  • Meet all above the above quality expectations

Passing work will:

  • Generally meet the above quality expectations but will require more separation of tasks across files or more use of saving clean datasets

Needs Improvement work will:

  • Sometimes meet the above quality expectations but will require more separation of tasks across files, more use of saving clean datasets, and more even contributions from team members to the codebase



Collaboration and GitHub

After Milestone 2, you set up your directory structure to organize your data separately from your code. With this setup, it is useful for each team member to have their own .qmd file for their own share of the explorations.

  • In this way when you commit, push, and pull, you won’t run into merge conflicts from overwriting contents of each other’s files.
  • Coordinating what explorations each team member is doing becomes the main topic to coordinate



Project milestones

Milestone 1

Due date: Monday, February 12

Purpose: The goal of Milestone 1 is to get the project moving early on in the semester to have time to make the final product as high quality as possible. You will form teams, lay the vision for your project, and make progress on that vision with one dataset.

Task (requirements for passing this Milestone):

Put the following information in a PDF, and submit on Moodle. (Only one team member per group needs to submit.)

  1. Write down names of all team members
  2. Briefly describe your topic/scope in a phrase/sentence.
  3. Describe 2-3 broad questions that you wish to explore within this topic. (Not all of them might be able to be investigated with the data source you find for this Milestone—that’s fine.)
  4. Find one data source, and read that data into R.
  5. Thoroughly describe the data context (who, what, when, where, why, and how? questions related to the dataset).
  6. Write up a data codebook. That is, describe the type and meaning of the variables in your dataset. Group your variables into categories (e.g., demographic variables, neighborhood variables).
    • If you have a lot of variables, it may not be necessary/feasible to describe every variable individually. Rather, you can describe groups of similar variables.
  7. Based on the data context write-up and your codebook, describe which of your 2-3 broad questions can be addressed with your dataset and how.
    • Write a plan for addressing these questions. Make sure that the steps in this plan are reasonable to complete in 2 weeks for Milestone 2. You will receive feedback on this plan and will be expected to integrate this feedback for Milestone 2.

Milestone 2

Due date: Monday, February 26

Purpose: The goal of Milestone 2 is to make progress on the goals you set out earlier and get tailored feedback on next steps to make the final product as high quality as possible.

Task (requirements for passing this Milestone):

  1. Create a team GitHub repository with a folder structure as follows:

    • your_github_project_folder
      • code
      • data
      • results
  2. Create a .gitignore file.

    • What is this? .gitignore is a special file that tells Git what files and folders to ignore in version control.
    • Steps to create this file:
      1. Open the Terminal in RStudio. Enter the command pwd. Make sure that a path to your project folder is displayed. If not, use the command cd RELATIVE/PATH/TO/YOUR/PROJECT/FOLDER to change the directory to your project folder. The part that comes after cd is a relative path from your current location to your project folder. If you need to use cd, use pwd again afterward to confirm that you are in your project folder.

      2. Enter the command touch .gitignore.

      3. Enter the command ls -a. You should see all files in this directory (including hidden files that start with .). You should see the .gitignore file.

      4. Enter the command open .gitignore. Note that you can use tab completion in the Terminal for typing shortcuts. After you type open .giti hit Tab to auto-complete the rest of the .gitignore file name. This will open the .gitignore file in RStudio or your computer’s plain text editor.

      5. Add the following lines to your .gitignore file, and save the file.

        data/
        .DS_Store (a hidden Mac file)
        .Rhistory
        .Rproj.user
        *.Rproj
        .quarto/
    • Add, commit, and push your code/ folder and the .gitignore file. Unless you change your .gitignore later, this is the only time you will need to add, commit, and push your .gitignore file.
  3. Add the instructor as a collaborator to your project GitHub repository. (GH username: lmyint)

  4. Add a code chunk to the end of all of your .Rmd/.qmd documents with sessionInfo()

    • When rendering your markdown file to HTML, this adds information about all packages used in the file as well as their package versions. As packages get updated over time, old code may break, so it is good to know what version of a package was used to complete your work so that you can restore that particular package version.
  5. Complete the steps in your plan from Milestone 1 (the plan with feedback from the instructional team)

  6. Write a plan for further pursuing your 2-3 broad questions. Make sure that the steps in this plan are reasonable to complete in 2 weeks for Milestone 3. You will receive feedback on this plan and will be expected to integrate this feedback for Milestone 3. Questions to think about as you develop this plan:

    • Do your 2-3 original broad questions need to be revised?
    • Is it time to start looking for additional datasets?

Milestone 3

Due date: Wednesday, March 27

Purpose: The goal of Milestone 3 is to make progress on the goals you set out earlier and get tailored feedback on next steps to make the final product as high quality as possible.

Task (requirements for passing this Milestone):

  1. Complete the steps in your plan from Milestone 2 (the plan with feedback from the instructional team)
  2. Write a short blog post (no more than 1000 words) about your results so far. Use a .qmd to generate an HTML file for this post. In this post, you should:
    • Motivate the importance of the topic
    • Lead the reader through the rationale for the narrowing/focusing of the scope via the main 2-3 broad questions
    • Tie results (plots and modeling output) to the broad questions and explain how all results fit together
    • End with main takeaways, limitations with regard to the data context and ethical considerations, and future directions