First, you can directly load the dataset from the following URL:
mydata <- read.csv("https://ximarketing.github.io/class/TripAdvisor.csv",
fileEncoding = "UTF-8-BOM")
summary(mydata)
## Name Local CountRestaurant CountReview
## Length:149912 Min. :0.0000 Min. : 0.00 Length:149912
## Class :character 1st Qu.:0.0000 1st Qu.: 5.00 Class :character
## Mode :character Median :0.0000 Median : 16.00 Mode :character
## Mean :0.2012 Mean : 35.67
## 3rd Qu.:0.0000 3rd Qu.: 42.00
## Max. :1.0000 Max. :2149.00
## CountVotes Rating Helpful Mobile
## Length:149912 Min. :1.000 Min. : 0.0000 Min. :0.0000
## Class :character 1st Qu.:4.000 1st Qu.: 0.0000 1st Qu.:0.0000
## Mode :character Median :5.000 Median : 0.0000 Median :0.0000
## Mean :4.505 Mean : 0.5684 Mean :0.2052
## 3rd Qu.:5.000 3rd Qu.: 1.0000 3rd Qu.:0.0000
## Max. :5.000 Max. :206.0000 Max. :1.0000
## TitleLength Length Photo Date
## Min. : 0.0 Min. : 0.0 Min. :0.0000 Length:149912
## 1st Qu.: 15.0 1st Qu.: 166.0 1st Qu.:0.0000 Class :character
## Median : 23.0 Median : 278.0 Median :0.0000 Mode :character
## Mean : 25.6 Mean : 357.6 Mean :0.1765
## 3rd Qu.: 33.0 3rd Qu.: 461.0 3rd Qu.:0.0000
## Max. :128.0 Max. :5884.0 Max. :8.0000
## Positive Negative Subjectivity Menu
## Min. :0.00000 Min. :0.000000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.04412 1st Qu.:0.000000 1st Qu.:0.07014 1st Qu.:0.000000
## Median :0.06818 Median :0.003497 Median :0.17417 Median :0.000000
## Mean :0.07593 Mean :0.012575 Mean :0.25061 Mean :0.001047
## 3rd Qu.:0.09804 3rd Qu.:0.020270 3rd Qu.:0.37500 3rd Qu.:0.000000
## Max. :1.00000 Max. :0.500000 Max. :1.00000 Max. :1.000000
## Building Meat Vegetable Person
## Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.000000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.000000
## Median :0.00000 Median :0.0000 Median :0.00000 Median :0.000000
## Mean :0.01006 Mean :0.0313 Mean :0.03086 Mean :0.006237
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.000000
## Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.000000
Let us convert the date into weekdays (i.e., Monday, Tuesday, …) using the strftime function of R:
mydata$Weekday = strftime(mydata$Date, "%A")
Suppose that we want to know whether a local reviewer is tougher or nicer, and whether individuals behave differently on Weekends, we can run the following regression:
result <- lm(Rating ~ Local + factor(Weekday), data = mydata)
summary(result)
##
## Call:
## lm(formula = Rating ~ Local + factor(Weekday), data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5396 -0.5124 0.4761 0.4876 0.5765
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.532581 0.006238 726.608 < 2e-16 ***
## Local -0.084126 0.005365 -15.679 < 2e-16 ***
## factor(Weekday)Monday -0.015763 0.008183 -1.926 0.05406 .
## factor(Weekday)Saturday -0.024908 0.008638 -2.883 0.00393 **
## factor(Weekday)Sunday -0.020167 0.008419 -2.396 0.01660 *
## factor(Weekday)Thursday -0.012348 0.008503 -1.452 0.14646
## factor(Weekday)Tuesday 0.007067 0.008048 0.878 0.37986
## factor(Weekday)Wednesday -0.008706 0.008306 -1.048 0.29453
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.832 on 149904 degrees of freedom
## Multiple R-squared: 0.001761, Adjusted R-squared: 0.001714
## F-statistic: 37.78 on 7 and 149904 DF, p-value: < 2.2e-16
As you can see, a local reviewer is tougher, and people write tougher reviews on weekends.