Preliminary Syllabus
(Version BEFORE student input on learning goals and grading system)
What is this course about?
Nature doesn’t reveal its secrets easily.
- Thomas Kempa
Nor do data.
But that is exactly what can make data science so thrilling!
This course is about empowering you with the wisdom to ask the best questions of data–ones that are meaningful, adaptive, and equity-minded–and the technical savvy to answer them.
Because your careers (whether in data science or not), will all involve further learning and working with others, my other primary goal is for you to cultivate self-reflection skills with regards to your own learning and your collaboration with others. In this way, I hope that you feel confident learning new skills on your own in the future and contributing to a welcoming work community.
This second course in the data science curriculum emphasizes advanced data wrangling and manipulation, interactive visualization, writing functions, working with data in databases, version control, and data ethics. Through open-ended and interdisciplinary projects, students practice the constant feedback loop of asking questions of the data, manipulating the data to help answer the question, and then returning to more questions. Prerequisite(s): COMP 112 and COMP 123 and STAT 155; STAT 253 recommended but not required.
Course learning goals
By the end of this course you should be able to:
- Sustain a habit of self-reflection in your learning process so that you are equipped for independent learning
- Sustain a habit of self-reflection in your collaborative work so that you can form community no matter where you go
- Create a variety of visualizations in ggplot2 that go beyond the plot types that you learned in STAT/COMP 112
- Wrangle and visualize spatial data
- Create interactive web applications and visualizations that adapt to user input
- Wrangle arbitrarily messy data using functional programming tools in R
- Acquire data from a variety of sources
- Write statements in structured query language (SQL) to access data from databases
- Write code to access data from application programming interfaces (APIs)
- Write code to scrape data from websites and evaluate the ethics of collecting such data
- Use appropriate methods when working with missing data
- Translate code between R and Python
- Articulate the role of machine learning and causal inference in data science work
- Iterate on the question-explore-question cycle to craft compelling data stories with attention to ethical considerations
- Maintain a digital portfolio of your data science projects
- Which of the learning goals above do you disagree with or want more clarity on?
- Do you have any goals that you’d like to include on this list?
Course communication
How to contact me
Students sometimes wonder what to call their professors. I prefer to be called Leslie (lez-lee), but if you prefer to be more formal, I am also ok with Professor Myint (pronounced “mee-int”). My preferred gender pronouns are she/her/hers.
Please help me make sure that I call you by your preferred name and pronouns too!
I love getting to talk to students outside of class time—whether about class-related topics or anything else. Come chat with me!
I’ll be setting times for drop-in hours based on feedback from the pre-course survey. I’ll update my drop-in hours on our course homepage and Moodle when they’re finalized.
I’m also happy to meet one-on-one if my normal drop-in hours don’t work. You can schedule a time to meet with me via Calendly.
I’ll also often be in the Leonard Center weight room from about 4:30-5:10 some weekdays. Feel free to say hello!
Discussion board (Slack)
Slack is a commonly used communication tool in industry and is useful to be familiar with, so we’ll be using it as our discussion board.
- If you’re new to Slack, this video provides a quick overview.
- First join our STAT/COMP 212: Fall 2023 workspace here.
- After joining, you can access our workspace here. (You might want to bookmark this if you have Slack open in your web broswer.)
Guiding values
Community is key
A sense of community and connectedness can provide a powerful environment for learning: Research shows that learning is maximized when students feel a sense of belonging in the educational environment (e.g., Booker, 2016). A negative climate may create barriers to learning, while a positive climate can energize students’ learning (e.g., Pascarella & Terenzini, cited in How Learning Works, 2012).
Our class is participating in the Classroom Community & Connectedness Project this semester. On Thursday, September 21 our class sessions will be a facilitated class activity (peers and facilitators only; I will not be present) to collectively reflect on strengthening our classroom community, with the intent of improving the learning environment for all of us.
Your participation is voluntary, but very much appreciated. The project is co-sponsored by Macalester’s Serie Center for Scholarship and Teaching and our Office of Institutional Research & Assessment.
Reflection is paramount
The content you learn will be cool (unbiased opinion!), but it is a guarantee that as technology evolves, some part of it will become out of date during your careers. What you will need to rely on when you leave Macalester is what I want to ensure you cultivate now: a good learning process. And the cornerstone of a good learning process is reflection.
Reflection is not just fundamental to learning content–it’s fundamental to learning any sort of intellectual, emotional, or physical skill. For this reason, I will be prioritizing reflection as a goal for our course in both content learning and collaborative activities. (Note that these reflection goals are the first two course learning goals.)
Mistakes are essential
An expert is a person who has made all the mistakes which can be made in a narrow field.
- Niels Bohr, Nobel Prize-winning physicist
I don’t feel comfortable working with a new R package until I’ve seen the same errors over and over again. Seeing new errors helps me understand the constraints of the code and the assumptions that I was making about my data.
We’re going to be seeking out mistakes like my cats hunt for leftover food scraps.
Communication is a superpower
Every time I go to a conference talk on a technical topic, it is striking how quickly laptops or phones come out because of the inability to follow. Academics notoriously struggle to make ideas accessible to others.
I want communication to be very different for you.
Every time you communicate ideas–whether through writing, visuals, or oral presentation–I want you to be a total boss. The end product of strong communication is a better experience for all those who have given you their attention. What’s more, the process of crafting effective communication is invaluable for deepening your own understanding:
Read to collect the dots, write to connect them pic.twitter.com/YbgnKKFUNn
— David Perell (@david_perell) July 5, 2021
How to thrive and what to expect
When taking a new course, figuring out the right workflow/cadence of effort throughout the week can be a big adjustment. And most of you are doing this for 4 different courses! Below are some suggestions for what to expect in the course and how to focus your time and attention during and outside of class.
Outside of class
Pre-class reading: Most class periods will have a required reading to review ideas from previous courses or to familiarize yourself with new concepts before seeing them again in class. My goal for these readings is for you to get the most out of class time by being able to more easily follow explanations in class and to engage most fully in class activities. I will provide Guiding Questions for each reading to focus your attention.
Scan the Guiding Questions before reading to preview the main ideas. Fill in answers to these questions as you read. Ask (and answer!) questions in the #questions
channel in our Slack workspace.
Podcast discussions: About every other week, we will discuss a podcast for the first ~20 minutes of class. These podcasts are meant to expose you aspects of data science in industry. Record anything that you’re curious about, and come prepared to discuss.
Throughout the semester, if you come across any media that is relevant to the course, feel free to suggest it as a discussion piece by emailing the instructor or posting it in the #general
channel in our Slack workspace.
- Record any reflections from in-class time about your learning process or interactions with peers while they are still fresh.
- After learning a new topic in class, it is helpful to immediately attempt the related exercises on the weekly homework.
- Come to instructor drop-in hours to chat about the course or anything else 😃
During class
Class time will be a mix of interactive lecture and longer stretches of group work. During the lecture portion, I will pause explanation frequently to prompt a short exercise or ask questions that you’ll reflect on individually or together.
Review your learning process and group work reflections just before class to frame how you want to engage in class. (Perhaps you’ve noted a struggle and want to try a new strategy.) I’ll always leave a few minutes at the end of class for synthesis. Use this time to update your reflections and summarize the key takeaways from class.
Grading and feedback
For a long time, I have been uncomfortable with the outsized role that letter grades play in education. The article Teaching More by Grading Less (Or Differently) by Schinske and Tanner (source) shaped many of my viewpoints and may be an interesting read for you.
In short, grades tend to distract from learning (due to a greater focus on the grade than on qualitative feedback), create anxiety that hinders risk-taking and exploration, and create a power dynamic between the instructor and students that I am uncomfortable with.
If I had my way, I would never assign letter grades and only give qualitative comments all semester. Unfortunately, I am required to submit a letter grade at the end of the course.
I need your input: which of the following two grading systems should we use for the semester?
As you read the following, make note of what uncertainties or worries you have and what you agree with.
Come prepared to discuss with your classmates on the first day of class. We’ll use the first day to collaboratively decide on one grading system for both sections of the course. Our goal is for this grading system to be aligned with my values as an instructor and your needs as learners.
Option 1: evaluation of learning is done by instructor
The table below describes the qualities of A, B, and C grades in terms of the 3 core course components: reflections, homework, and the final project.
In order to earn a given letter grade, ALL of the requirements in the column must be met. (e.g., In order to earn an A, everything in the first column must be true.)
There is a possibility that by the end of the semester, work for the different course components falls under different letter grades. (e.g., A-level for reflections and homework, but B-level for project).
- If this happens, towards the end of the semester I will ask you to submit a written justification for the grade that you think you deserve. We will have a conference in which we discuss your justification.
- If we agree, your proposed grade will be your final grade.
- If we disagree, we will discuss what needs to be done next to come to an agreement. (This may involve being doing extra work or revisions.)
Course component | Letter grade: A | Letter grade: B | Letter grade: C |
---|---|---|---|
Self-reflection in learning process | Show clear growth and consistent thoughtfulness throughout the semester | Show some growth and some thoughtfulness throughout the semester | Show little to no growth and minimal thoughtfulness throughout the semester |
Self-reflection in collaborative learning (groupwork) | |||
Weekly homework | Pass all homework assignments | Pass all but 1 homework assignment | Pass all but 2 homework assignments |
Final project | Complete a project that is high quality in all of the following aspects:
|
Complete a project that is at least ok quality in the aforementioned aspects and high quality in some aspects | Complete a project that is ok quality in the aforementioned aspects |
Details about the course components are given below:
- Self-reflection:
- I will look at the growth in your reflection process over the course of 3 substantive reflections. These will roughly come a month into the course, mid-course, and the end of the semester.
- My expectation is that your mid-course and end-of-course reflections show that your are using feedback from previous reflections.
- I will look at the consistency in your reflection process by monitoring your Process and Reflection Google Doc.
- My expectation is that you are filling this in regularly throughout class activities and through work outside of class.
- I will look at the growth in your reflection process over the course of 3 substantive reflections. These will roughly come a month into the course, mid-course, and the end of the semester.
- Homework: We will have weekly homework in which you will complete activities that we start in class. The requirements for passing the assignment will be clearly stated in the homework directions.
- If you do not pass an assignment and wish to revise your work, you may resubmit the assignment the following week. This revision needs to be submitted by 2 weeks after the original homework due date. (See my policy on late work.)
- Final project: For this semester-long project, you will receive regular feedback on the quality of the different project aspects and will have the opportunity to improve the quality of the different aspects as you continue working.
Option 2: evaluation of learning is done by students in conversation with instructor
What constitutes A, B, and C-level work is essentially identical in Options 1 and 2 (see table below). That is, my standards for high quality work are the same between the two.
What’s different is how (who) participates in the evaluation process. In Option 1, I am making the final judgment on the quality of your work for all course components. In Option 2, you will start the assessment of quality by evaluating yourselves using qualitative feedback on all of your work.
In Option 2, the qualitative comments that you receive on all your work (reflections, homework, project) will form the basis of your self-evaluation. That is, you will review all comments from me and use this to gauge the degree to which you are meeting the course learning goals. (On homework, you will not receive a pass/not passing designation—you’ll just receive comments.) You will write self-evaluations ~1 month into the semester, in the middle of the semester (before midterm grades are due), and a final one at the end of the semester. In all of these self-evaluations, you will propose the grade that you think deserve. I will also evaluate your work, and if we disagree on the grade, we will meet to discuss what needs to be done next to come to an agreement. (This may involve being doing extra work or revisions.)
Note: Your self-evaluations need to be based on demonstrated evidence in the work that you submit. For example, if you have struggled with the data wrangling homework assignments and say in your self-evaluation that your understanding is now strong after reviewing feedback, this is not sufficient evidence. You would need to demonstrate your stronger understanding of data wrangling by submitting a revision of the data wrangling homework. (See policy on late work.)
Course component | Letter grade: A | Letter grade: B | Letter grade: C |
---|---|---|---|
Self-reflection in learning process | Show clear growth and consistent thoughtfulness throughout the semester | Show some growth and some thoughtfulness throughout the semester | Show little to no growth and minimal thoughtfulness throughout the semester |
Self-reflection in collaborative learning (groupwork) | |||
Weekly homework | Show strong understanding of concepts across all homework assignments | Show strong understanding of concepts across most homework assignments | Show adequate understanding of concepts across most homework assignments |
Final project | Complete a project that is high quality in all of the following aspects:
|
Complete a project that is at least ok quality in the aforementioned aspects and high quality in some aspects | Complete a project that is ok quality in the aforementioned aspects |
Both options satisfy my goal to have high standards for your learning that are achievable through thoughtful revision. No matter what option we use, you’ll have a personal Google Sheet for tracking your progress over time.
My other goals for our grading system are for it to motivate you to learn as much as possible and to reduce stress. For these goals, your input during our Day 1 discussion will be very valuable.
Textbooks
We will primarily use the following two textbooks (freely available online):
- Modern Data Science with R (3e) by Baumer, Kaplan, and Horton (Abbreviated as MDSR)
- R for Data Science (2e) by Wickham, Cetinkaya-Rundel, and Grolemund (Abbreviated as R4DS)
The following textbooks are also good resources (also freely available online):
- Introduction to Data Science: Data Wrangling and Visualization with R and Advanced Data Science: Statistics and Prediction Algorithms Through Case Studies by Irizarry
- Tidyverse Skills for Data Science by Wright, Ellis, Hicks, and Peng
- R Programming for Data Science by Peng
Other policies
Late work
Homework assignments will be due weekly on Wednesdays at midnight. If you anticipate needing more time to complete an assignment, please email me ahead of time to discuss. Limited extensions will always be granted:
- My ideal extension: Turn in the homework by the following Monday morning at 9am. (A 4 day, 9 hour extension)
- Why is this ideal for me? I want to return feedback on homework to everyone before the following Tuesday’s class because we will be briefly reviewing homework feedback in small groups.
- Firm limit on extensions: You must turn in the homework by 2 weeks after the due date (Wednesdays at 9am).
- Why is this my firm limit? I post solutions to the homework at this point.
- What if it’s past the 2-week limit and you still want to turn in the homework in some form? If this is the case, you need to create your own equivalent homework. This involves mapping the original homework’s exercises to a new dataset and completing those exercises. Note that I can’t make any guarantees about when I can get you feedback on this late submission. (Only guarantee = by the end of the semester)
Academic integrity
Academic integrity is the cornerstone of our learning community. Students are expected to be familiar with the college’s standards on academic integrity.
I encourage you to work with your classmates to discuss material and ideas for assignments, but in order for you to receive individualized feedback on your own learning, you must submit your own work. This involves writing your own code and putting explanations into your own words. Always cite any sources you use, including AI (see section below).
Artificial intelligence (AI) use
Learning to use AI tools is an emerging skill that we will explore together in this course. I expect you to use AI (ChatGPT, Google Bard)—in fact, some assignments may require it.
However, you should be aware of the limits of AI:
- AI is a tool, but one that you need to acknowledge using. Any ideas, language, or code that is produced by AI must be cited, just like any other resource. [sample suggestion: Please include a paragraph at the end of any assignment that uses AI explaining what you used the AI for and what prompts you used to get the results.] Failure to do so is in violation of the academic integrity policy at Macalester College.
- Don’t trust anything AI says. If it gives you a number, fact, or code, assume it is wrong unless you either know the answer or can check in with another source. AI works best for topics you understand.
- If you provide minimum effort prompts, you will get low quality results. You will need to refine your prompts in order to get good outcomes. This will take work.
- Be thoughtful about when this tool is useful. Don’t use it if it isn’t appropriate for the case or circumstance.
- The environmental impact of AI should not be ignored. The building and usage of AI tools consumes a lot of energy (see here and here). For this reason, we will be very thoughtful about when we use AI and will discuss other sustainability behaviors that we can incorporate into our lives to offset this usage.
How to cite usage of AI: Please copy and paste all prompts and output into an Appendix section accompanying each problem of an assignment.
If you have any questions about your use of AI tools, please contact me to discuss them.
Course project
The best way to learn data science and feel like a data scientist is to work on meaningful data-driven projects. The course project will be a semester-long, collaborative experience in which you investigate a series of meaningful questions through one or more datasets. Through collaboration, you will be able to build something way cooler than you could do alone.
My hope is for everyone to build something that you would be proud to showcase to an employer on your digital portfolio (your personal website). More details about the project can be found on the Project page.
The environment you deserve
I want you to succeed. Both here at Macalester and beyond. To help make this happen, I am committed to the following.
Respect: Everyone comes from a different path through life, and it is our moral duty as human beings to listen to each other without judgment and to respect one another. I have no tolerance for discrimination of any kind, in and out of the classroom. If you are seeking campus resources regarding discrimination, the Department of Multicultural Life and the Center for Religious and Spiritual Life are wonderful resources.
Sensitive Topics: Data science applications span issues in science, policy, and society. As such, we may sometimes address topics that are sensitive for you. I will try to announce in class if an assignment or activity involves a potentially sensitive topic. If you have reservations about a particular topic, please come talk to me to discuss possible options.
Accommodations: If you need accommodations for any reason, please contact Disability Services to discuss your needs, and speak with me as soon as possible afterwards so that we can discuss your accommodation plan. If you already have official accommodations, please discuss these with me within the first week of class so that you get off to a great start. Contact me if you have other special circumstances. I will find resources for you.
Title IX: You deserve a community free from discrimination, sexual harassment, hostility, sexual assault, domestic violence, dating violence, and stalking. If you or anyone you know has experienced harassment or discrimination, know that you are not alone. Macalester provides staff and resources to help you find support. More information is available on the Title IX website.
General Health and Well-being: I care that you prioritize your well-being in this semester and beyond. Investing time into taking care of yourself will have profound impacts on all aspects of your life. Remember that beyond being a student, you are a human being carrying your own experiences, thoughts, emotions, and identities. It is important to acknowledge any stressors you may be facing, which can be mental, emotional, physical, cultural, financial, etc., and how they can have an impact on you. I encourage you to remember that you have a body with needs. In the classroom, eat when you are hungry, drink water, use the restroom, and step out if you are upset and need some air. Please do what is necessary so long as it does not impede your or others’ ability to be mentally and emotionally present in the course. Outside of the classroom, sleeping well, moving your body, and connecting with others can be strategies can help nourish you. If you are having difficulties maintaining your well-being, please don’t hesitate to contact me and/or find support from physical and mental health resources here and here.