[This article was first published on r on Everyday Is A School Day, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

I messed around with DBI and RSQLite and learned it’s actually pretty simple to use in R – just connect, write tables, and use SQL queries without all the complicated server stuff. Thanks to Alec Wong for suggesting this!

image

Motivation

After our last
blog, my friend Alec Wong suggested that I switch storing data from CSV files to SQLite when building Plumber API. I had no idea that CSV files can get corrupted when multiple users hit the API at the same time! SQLite handles this automatically and lets you validate your data without needing to set up any complicated server stuff. It’s actually pretty straightforward, here is a note to myself of some simple and frequent functions.

Objectives

Connecting to A Database

library(DBI)
library(RSQLite)
library(tidyverse)

con <- dbConnect(drv = RSQLite::SQLite(), "test.sqlite")

That’s it! If the file does not exist, it will create one.

List Tables

Let’s write an sample dataframe and write to a table on the database

# example df
employees <- tibble(
  name = c("John Doe", "Jane Smith", "Bob Johnson", "Alice Brown"),
  department = c("IT", "HR", "Finance", "Marketing"),
  salary = c(75000, 65000, 80000, 70000)
)

# write df to dataframe
dbWriteTable(conn = con, name = "employees", value = employees)

# See What talbes are in the database
tables <- dbListTables(con)
tables

## [1] "employees"

Pretty straightforward!

Check Data

## Method 1
employees_db <- tbl(con, "employees")
employees_db |> collect()

## # A tibble: 4 × 3
##   name        department salary
##   <chr>       <chr>       <dbl>
## 1 John Doe    IT          75000
## 2 Jane Smith  HR          65000
## 3 Bob Johnson Finance     80000
## 4 Alice Brown Marketing   70000

Have to use collect to return a df. We can also do this instead

## Method 2
dbGetQuery(con, "select * from employees")

##          name department salary
## 1    John Doe         IT  75000
## 2  Jane Smith         HR  65000
## 3 Bob Johnson    Finance  80000
## 4 Alice Brown  Marketing  70000

Add Data

## Create New Row of Data
new_employee <- data.frame(
  name = "Sarah Johnson",
  department = "Research",
  salary = 78000
)

## Write to existing table
dbWriteTable(conn = con, name = "employees", value = new_employee, append = TRUE)

tbl(con, "employees") |> collect()

## # A tibble: 5 × 3
##   name          department salary
##   <chr>         <chr>       <dbl>
## 1 John Doe      IT          75000
## 2 Jane Smith    HR          65000
## 3 Bob Johnson   Finance     80000
## 4 Alice Brown   Marketing   70000
## 5 Sarah Johnson Research    78000

Dataframe must contain the same column names and number. Else, won’t work

## New column
new_employee <- data.frame(
  name = "Sarah Johnson",
  department = "Research",
  salary = 78000,
  something_new = 12321321
)

dbWriteTable(con, "employees", value = new_employee, append = T)
# OR
# dbAppendTable(con, "employees", new_employee)

Error: Columns `something_new` not found

Query Data

Filter

dbGetQuery(con, "select * from employees where department = 'Research'")

##            name department salary
## 1 Sarah Johnson   Research  78000

Filter With Matching Operator

dbGetQuery(con, "select * from employees where name like '%john%'")

##            name department salary
## 1      John Doe         IT  75000
## 2   Bob Johnson    Finance  80000
## 3 Sarah Johnson   Research  78000

notice that it’s case insensitive when we use like.

dbGetQuery(con, "select * from employees where name like 's%'")

##            name department salary
## 1 Sarah Johnson   Research  78000

Group Department and Return Average Salary

dbGetQuery(con, "select department, avg(salary) as avg_salary
           from employees
           group by department")

##   department avg_salary
## 1    Finance      80000
## 2         HR      65000
## 3         IT      75000
## 4  Marketing      70000
## 5   Research      78000

Sum Salary With New Column Name

dbGetQuery(con, "select sum(salary) as total_salary from employees")

##   total_salary
## 1       368000

Count Number of Departments

dbGetQuery(con, "select count(distinct department) as distinct_department
           from employees")

##   distinct_department
## 1                   5

Using glue_sql

var <- c("name","department")
table <- "employees"
query <- glue::glue_sql("select {`var`*} from {`table`}", .con = con)
dbGetQuery(con, query)

##            name department
## 1      John Doe         IT
## 2    Jane Smith         HR
## 3   Bob Johnson    Finance
## 4   Alice Brown  Marketing
## 5 Sarah Johnson   Research

Notice the asterisk (*) after {var} – this tells glue_sql() to join the elements with commas automatically. glue_sql provides an f-string feel to the code.

Remove Data

## Delete Using Filter
dbExecute(con, "delete from employees where name = 'Sarah Johnson'")

## [1] 1

dbGetQuery(con, "select * from employees")

##          name department salary
## 1    John Doe         IT  75000
## 2  Jane Smith         HR  65000
## 3 Bob Johnson    Finance  80000
## 4 Alice Brown  Marketing  70000

Remove With Filter

dbGetQuery(con, "select * from employees")

##          name department salary
## 1    John Doe         IT  75000
## 2  Jane Smith         HR  65000
## 3 Bob Johnson    Finance  80000
## 4 Alice Brown  Marketing  70000

dbExecute(con, "delete from employees where salary >= 75000 and department = 'Finance'")

## [1] 1

dbGetQuery(con, "select * from employees")

##          name department salary
## 1    John Doe         IT  75000
## 2  Jane Smith         HR  65000
## 3 Alice Brown  Marketing  70000

Notice how = requires case sensitive F on Finance to filter accurately? Bob no longer in dataframe!

Disconnect

dbDisconnect(con)

Acknowledgement

Thanks again to Alec for suggesting improvements on our previous project!

For Completeness Sake of Prior plumber.R

library(plumber)
library(tidyverse)
library(lubridate)
library(DBI)
library(RSQLite)

path <- "" #set your own path
con <- dbConnect(RSQLite::SQLite(), paste0(path,"migraine.sqlite"))

#* @apiTitle Migraine logger

#* Return HTML content
#* @get /
#* @serializer html
function() {

  # Return HTML code with the log button
  html_content <- '
     <!DOCTYPE html>
     <html>
     <head>
       <title>Migraine Logger</title>
     </head>
     <body>
       <h1>Migraine Logger</h1>
       <button id="submit">Oh No, Migraine Today!</button>
       <div id="result" style="display: none;"></div>

      <script>
       document.getElementById("submit").onclick = function() {
          fetch("/log", {
            method : "post"
          })
          .then(response => response.json())
          .then(data => {
            const resultDiv = document.getElementById("result");
            resultDiv.textContent = data[0];
            resultDiv.style.display = "block";
          })
          .catch(error => {
            const resultDiv = document.getElementById("result");
            resultDiv.textContent = error.message
          })
       };
      </script>

     </body>
     </html>
     '
  return(html_content)
}

#* logging
#* @post /log
function(){
  date_now <- tibble(date=Sys.time())
  dbWriteTable(con, "migraine", date_now, append = TRUE)
  list(paste0("you have logged ", date_now$date[1], " to migraine database"))
}

#* download data
#* @get /download
#* @serializer contentType list(type="text/csv")
function(){
  # Just return the raw CSV content
  df <- tbl(con, "migraine") |> collect() |> mutate(date = as_datetime(date, tz = "America/New_York"))
  format_csv(df)
}

#* Check datetime on browser
#* @get /table
function(){
  df <- tbl(con, "migraine") |> collect() |> mutate(date = as_datetime(date, tz = "America/New_York"))
  list(df)
}

Lessons Learnt

  • Lots of goodies on
    DBI official website
  • Learnt how to set up SQLite on Rpi, incorporated it on the previous migraine logger
  • Definitely need to be comfortable with SQL to use this
  • Might be a good idea to add this to the pressure logger too! Maybe in the same database but different table!

If you like this article:

To leave a comment for the author, please follow the link and comment on their blog: r on Everyday Is A School Day.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Exploring `RSQLite` With `DBI`: A Note To Myself

Understanding DBI and RSQLite: Key Insights and Future Developments

The initial analysis revealed the ease of using DBI and RSQLite in R without dealing with all the complicated server concepts. The key motivation behind this transition was driven by Alec Wong’s suggestion to switch from storing data in CSV files to SQLite when building Plumber API. The key findings revealed that CSV files can easily get corrupted when multiple users access the API simultaneously. SQLite effectively manages this through automatic validation of your data, without necessitating any complex server setup.

Long-term Implications

In the long term, the SQLite adoption can enhance data handling in various ways. SQLite is capable of handling multiple users without any data corruption problems. Moreover, SQLite allows easy data validation without the need for a complicated server setup. Future API building or other data storing tasks may increasingly choose to use systems such as SQLite over traditional options like CSV to enhance data security and ease of use.

Potential Future Developments

As more people discover the benefits of SQLite, there may be further improvements and developments in this area. The future may see an increase in functions and procedures that provide even greater ease in data management and query execution. Furthermore, as data integrity and security become more prevalent concerns, techniques such as SQLite can be expected to become integral parts of most data management processes.

Actionable Advice

If you’re considering a migration from CSV files to SQLite, it’s advisable to get comfortable with SQL as this will be a primary requirement for using SQLite. It might also be beneficial to add this to your pressure logger for even more efficient data handling. Depending on your specific needs, you could add it within the same database even in a different table.

Exploring the Procedures

Several operations, ranging from connecting to a database, checking data, adding data, querying data, removing data and disconnecting from a database were performed. The data was neatly organized into easy-to-understand tables making SQLite handling in R a smooth process. Specific code snippets showcasing the different functions executed have been provided for reference.

Conclusion

In conclusion, switching to SQLite from CSV files for storing data when building a Plumber API has numerous benefits such as preventing file corruption from multiple simultaneous accesses and allowing data validation without complex server settings. Learning and enhancing SQL skills are prerequisites to fully leverage the benefits of SQLite and other similar systems.

Read the original article