lastmod: 11 May, 2020
Preamble
Suppose that our University has some surveillance system in place that can track the residents’ movements when they are in Campus. For our exercise, let assume we want to find out where one of the students, creatively named John, is at given day and time of day. For that, we extract only the information required from our big, big data about all students, i.e., the type of day (week day vs weekend) and John’s known past locations in the morning (say between 08:00-12:00).
Finding John’s location depending on the day type (week day or weekend)
Load the
naivebayes
package.Use
naive_bayes()
with a formula likey ~ x
to build a model of location as a function ofdaytype
.Forecast the location of John during a morning weekday, using
predict()
with theweekday_morning
object as thenewdata
argument.Repeat the prediction for a
weekend_morning
location.
Solution
# prepare the environment
library(tidyverse)
library(readr)
locations_data <- readr::read_csv(here::here("/content/post/ml-nb-predictloc/_data/locations_john_morning.csv"))
# find out the day type if it's weekday
weekday_morning <- locations_data %>% dplyr::select(daytype) %>% dplyr::filter(daytype == "weekday") %>% slice(n())
# find out the day type if it's a weekend
weekend_morning <- locations_data %>% dplyr::select(daytype) %>% dplyr::filter(daytype == "weekend") %>% slice(n())
# Load the naivebayes package
library(naivebayes)
# Build the location prediction model
locmodel <- naive_bayes(location ~ daytype, data = locations_data)
# Predict location on a week day
predict(locmodel, weekday_morning)
## [1] class
## Levels: campus class dorm tutoring
# Predict location on a weekend
predict(locmodel, weekend_morning)
## [1] dorm
## Levels: campus class dorm tutoring
Find the “raw” probabilities of John being at those locations
The naivebayes
package offers several ways to peek inside a Naive Bayes model.
Typing the name of the model object provides the a priori (overall) and conditional probabilities of each of the model’s predictors. If one were so inclined, you might use these for calculating posterior (predicted) probabilities by hand.
Alternatively, R will compute the posterior probabilities for you if the type = "prob"
parameter is supplied to the predict()
function.
Using these methods, examine how the model’s predicted 9am location probability varies from day-to-day. The model locmodel that you fit in the previous exercise is in your workspace.
Solution
- Print the locmodel object to the console to view the computed a priori and conditional probabilities.
- Use the
predict()
function similarly to the previous exercise, but withtype = "prob"
to see the predicted probabilities for John’s location on a week day’s morning.
- Compare these to the predicted probabilities for John’s location on a weekend morning.
# The 'naivebayes' package is loaded into the workspace
# and the Naive Bayes 'locmodel' has been built
# Examine the location prediction model
locmodel
##
## ================================== Naive Bayes ==================================
##
## Call:
## naive_bayes.formula(formula = location ~ daytype, data = locations_data)
##
## ---------------------------------------------------------------------------------
##
## Laplace smoothing: 0
##
## ---------------------------------------------------------------------------------
##
## A priori probabilities:
##
## campus class dorm tutoring
## 0.10989011 0.42857143 0.45054945 0.01098901
##
## ---------------------------------------------------------------------------------
##
## Tables:
##
## ---------------------------------------------------------------------------------
## ::: daytype (Bernoulli)
## ---------------------------------------------------------------------------------
##
## daytype campus class dorm tutoring
## weekday 1.0000000 1.0000000 0.3658537 1.0000000
## weekend 0.0000000 0.0000000 0.6341463 0.0000000
##
## ---------------------------------------------------------------------------------
# Obtain the predicted probabilities for a week day morning
predict(locmodel, weekday_morning, type = "prob")
## campus class dorm tutoring
## [1,] 0.1538462 0.6 0.2307692 0.01538462
# Obtain the predicted probabilities for a weekend morning
predict(locmodel, weekend_morning, type = "prob")
## campus class dorm tutoring
## [1,] 0.0003838772 0.001497121 0.9980806 3.838772e-05
Final notes
Some of you with more experience in machine learning may have encountered similar exercises in your previous learning about the Naive Bayes classifier. It isn’t a coincidence. The data file that we used here is adapted from a data file very commonly used in Naive Bayes tutorials. However, please, remember, it is not as much the actual data that you use, but how you frame the problem.
Last modified on 2021-04-07