Learning Analytics
workshop support site
Machine Learning - Using Naïve Bayes to predict locations

lastmod: 11 May, 2020

Preamble

Suppose that our University has some surveillance system in place that can track the residents’ movements when they are in Campus. For our exercise, let assume we want to find out where one of the students, creatively named John, is at given day and time of day. For that, we extract only the information required from our big, big data about all students, i.e., the type of day (week day vs weekend) and John’s known past locations in the morning (say between 08:00-12:00).

Finding John’s location depending on the day type (week day or weekend)

  • Load the naivebayes package.

  • Use naive_bayes() with a formula like y ~ x to build a model of location as a function of daytype.

  • Forecast the location of John during a morning weekday, using predict() with the weekday_morning object as the newdata argument.

  • Repeat the prediction for a weekend_morning location.

Solution

# prepare the environment 
library(tidyverse)
library(readr)
locations_data    <- readr::read_csv(here::here("/content/post/ml-nb-predictloc/_data/locations_john_morning.csv"))
# find out the day type if it's weekday
weekday_morning <- locations_data %>% dplyr::select(daytype) %>% dplyr::filter(daytype == "weekday") %>% slice(n())
# find out the day type if it's a weekend
weekend_morning <- locations_data %>% dplyr::select(daytype) %>% dplyr::filter(daytype == "weekend") %>% slice(n())
# Load the naivebayes package
library(naivebayes)

# Build the location prediction model
locmodel <- naive_bayes(location ~ daytype, data = locations_data)


# Predict location on a week day
predict(locmodel, weekday_morning)
## [1] class
## Levels: campus class dorm tutoring
# Predict location on a weekend
predict(locmodel, weekend_morning)
## [1] dorm
## Levels: campus class dorm tutoring

Find the “raw” probabilities of John being at those locations

The naivebayes package offers several ways to peek inside a Naive Bayes model.

Typing the name of the model object provides the a priori (overall) and conditional probabilities of each of the model’s predictors. If one were so inclined, you might use these for calculating posterior (predicted) probabilities by hand.

Alternatively, R will compute the posterior probabilities for you if the type = "prob" parameter is supplied to the predict() function.

Using these methods, examine how the model’s predicted 9am location probability varies from day-to-day. The model locmodel that you fit in the previous exercise is in your workspace.

Solution

  • Print the locmodel object to the console to view the computed a priori and conditional probabilities.
  • Use the predict() function similarly to the previous exercise, but with type = "prob" to see the predicted probabilities for John’s location on a week day’s morning.
  • Compare these to the predicted probabilities for John’s location on a weekend morning.
# The 'naivebayes' package is loaded into the workspace
# and the Naive Bayes 'locmodel' has been built

# Examine the location prediction model
locmodel
## 
## ================================== Naive Bayes ================================== 
##  
##  Call: 
## naive_bayes.formula(formula = location ~ daytype, data = locations_data)
## 
## --------------------------------------------------------------------------------- 
##  
## Laplace smoothing: 0
## 
## --------------------------------------------------------------------------------- 
##  
##  A priori probabilities: 
## 
##     campus      class       dorm   tutoring 
## 0.10989011 0.42857143 0.45054945 0.01098901 
## 
## --------------------------------------------------------------------------------- 
##  
##  Tables: 
## 
## --------------------------------------------------------------------------------- 
##  ::: daytype (Bernoulli) 
## --------------------------------------------------------------------------------- 
##          
## daytype      campus     class      dorm  tutoring
##   weekday 1.0000000 1.0000000 0.3658537 1.0000000
##   weekend 0.0000000 0.0000000 0.6341463 0.0000000
## 
## ---------------------------------------------------------------------------------
# Obtain the predicted probabilities for a week day morning
predict(locmodel, weekday_morning, type = "prob")
##         campus class      dorm   tutoring
## [1,] 0.1538462   0.6 0.2307692 0.01538462
# Obtain the predicted probabilities for a weekend morning
predict(locmodel, weekend_morning, type = "prob")
##            campus       class      dorm     tutoring
## [1,] 0.0003838772 0.001497121 0.9980806 3.838772e-05

Final notes

Some of you with more experience in machine learning may have encountered similar exercises in your previous learning about the Naive Bayes classifier. It isn’t a coincidence. The data file that we used here is adapted from a data file very commonly used in Naive Bayes tutorials. However, please, remember, it is not as much the actual data that you use, but how you frame the problem.


Last modified on 2021-04-07