First, you can directly load the dataset from the following URL:
mydata <- read.csv("https://ximarketing.github.io/class/ABOM/TripAdvisor.csv",
fileEncoding = "UTF-8-BOM")
summary(mydata)
## Name Local CountRestaurant CountReview
## Length:149912 Min. :0.0000 Length:149912 Length:149912
## Class :character 1st Qu.:0.0000 Class :character Class :character
## Mode :character Median :0.0000 Mode :character Mode :character
## Mean :0.2012
## 3rd Qu.:0.0000
## Max. :1.0000
## CountVotes Rating Date Month
## Length:149912 Min. :1.000 Length:149912 Length:149912
## Class :character 1st Qu.:4.000 Class :character Class :character
## Mode :character Median :5.000 Mode :character Mode :character
## Mean :4.505
## 3rd Qu.:5.000
## Max. :5.000
## TitleLength Length Mobile Sentiment
## Min. : 0.0 Min. : 0.0 Length:149912 Min. :-1.0000
## 1st Qu.: 15.0 1st Qu.: 166.0 Class :character 1st Qu.: 0.2123
## Median : 23.0 Median : 278.0 Mode :character Median : 0.3463
## Mean : 25.6 Mean : 357.6 Mean : 0.3466
## 3rd Qu.: 33.0 3rd Qu.: 461.0 3rd Qu.: 0.4820
## Max. :128.0 Max. :5884.0 Max. : 1.0000
## Subjectivity Happy Angry Sad
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.4986 1st Qu.:0.1500 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.5879 Median :0.2700 Median :0.00000 Median :0.1400
## Mean :0.5799 Mean :0.2991 Mean :0.03846 Mean :0.1559
## 3rd Qu.:0.6800 3rd Qu.:0.4200 3rd Qu.:0.06000 3rd Qu.:0.2500
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000
## Surprise Helpful Photo Menu
## Min. :0.0000 Min. : 0.0000 Min. :0.0000 Min. :0.000000
## 1st Qu.:0.0800 1st Qu.: 0.0000 1st Qu.:0.0000 1st Qu.:0.000000
## Median :0.2000 Median : 0.0000 Median :0.0000 Median :0.000000
## Mean :0.2135 Mean : 0.5684 Mean :0.1765 Mean :0.001047
## 3rd Qu.:0.3200 3rd Qu.: 1.0000 3rd Qu.:0.0000 3rd Qu.:0.000000
## Max. :1.0000 Max. :206.0000 Max. :8.0000 Max. :1.000000
## Building Meat Vegetable Person
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.00000 Median :0.0000 Median :0.00000 Median :0.000000
## Mean :0.01006 Mean :0.0313 Mean :0.03086 Mean :0.006237
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.000000
## Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.000000
Let us convert the date into weekdays (i.e., Monday, Tuesday, …) using the strftime function of R:
mydata$Weekday = strftime(mydata$Date, "%A")
Suppose that we want to know whether a local reviewer is tougher or nicer, and whether individuals behave differently on Weekends, we can run the following regression:
result <- lm(Rating ~ Local + factor(Weekday), data = mydata)
summary(result)
##
## Call:
## lm(formula = Rating ~ Local + factor(Weekday), data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5396 -0.5124 0.4761 0.4876 0.5765
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.532581 0.006238 726.608 < 2e-16 ***
## Local -0.084126 0.005365 -15.679 < 2e-16 ***
## factor(Weekday)Monday -0.015763 0.008183 -1.926 0.05406 .
## factor(Weekday)Saturday -0.024908 0.008638 -2.883 0.00393 **
## factor(Weekday)Sunday -0.020167 0.008419 -2.396 0.01660 *
## factor(Weekday)Thursday -0.012348 0.008503 -1.452 0.14646
## factor(Weekday)Tuesday 0.007067 0.008048 0.878 0.37986
## factor(Weekday)Wednesday -0.008706 0.008306 -1.048 0.29453
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.832 on 149904 degrees of freedom
## Multiple R-squared: 0.001761, Adjusted R-squared: 0.001714
## F-statistic: 37.78 on 7 and 149904 DF, p-value: < 2.2e-16
As you can see, a local reviewer is tougher, and people write tougher reviews on weekends.