Coding strategies and principles

Core principle of writing good functions

A function should complete a single small task

Applying this principle takes hard work, but it will make your code easier to read, debug, and reuse.

Open up the Homework 8 page alongside this for a reflection exercise.

“Single” and “small” are hard to get a sense of, so let’s look at solutions from our previous scraping activity:

# Helper function to reduce html_elements() %>% html_text() code duplication
get_text_from_page <- function(page, css_selector) {
    page %>%
        html_elements(css_selector) %>%
        html_text()
}

scrape_page <- function(url) {
    Sys.sleep(2)
    page <- read_html(url)
    article_titles <- get_text_from_page(page, ".teaser-title")
    article_dates <- get_text_from_page(page, ".date-display-single")
    article_abstracts <- get_text_from_page(page, ".teaser-description")
    article_abstracts <- str_remove(article_abstracts, "^.+—") %>% trimws()
    
    tibble(
        title = article_titles,
        date = article_dates,
        abstract = article_abstracts
    )
}

get_text_from_page() does a single small task–it gets the contents from a single CSS selector

  • This facilitates its reuse: we can easily use this function in another context
  • It’s easily debugged and understandable because it’s so short.

If you only call your function once, you need to revise your code to have your functions do smaller tasks.

  • Looping probably should not be done inside a function. Rather, we will generally want to call functions from inside loops.
    • Why? When you are looping, you are doing a task over and over. That task can be turned into a function.
    • If you find yourself writing a loop within a function, you likely need to write a helper function.

Example of how this might come about:

get_all_news_info <- function(url, title_selector, date_selector, abstract_selector, num_pages) {
    for (i in seq_len(num_pages)) {
        # Construct url for page i
        # page <- read_html()
        # get_text_from_page(page, title_selector)
        # get_text_from_page(page, date_selector)
        # get_text_from_page(page, abstract_selector)
    }
}
  • This function does many tasks (gets many pages) instead of a single task (just one page).





Core principles of navigating ChatGPT output

ChatGPT is a useful tool, but it is important to be mindful of the following principles.

Whenever you see a new function, look up the documentation. Understand how all of the arguments used in the ChatGPT output. For any function that is new to you, you should be able to explain its usage with a few new examples.

In addition to understanding the output, evaluate if the output follows good coding principles, like having a function doing a single, small task.

Creating a full suite of test cases still applies when using ChatGPT output.