First, you can directly load the dataset from the following URL:
mydata <- read.csv("https://ximarketing.github.io/class/Kickstarter-Project.csv",
fileEncoding = "UTF-8-BOM")
summary(mydata)
## URL Outcome Target FundingRaised
## Length:6958 Min. :0.0000 Min. : 1 Min. : 0
## Class :character 1st Qu.:0.0000 1st Qu.: 7000 1st Qu.: 35
## Mode :character Median :0.0000 Median : 20000 Median : 1000
## Mean :0.3067 Mean : 114663 Mean : 39350
## 3rd Qu.:1.0000 3rd Qu.: 50000 3rd Qu.: 12302
## Max. :1.0000 Max. :100000000 Max. :6225355
## Backers Comments Location Subtype
## Min. : 0.0 Length:6958 Length:6958 Length:6958
## 1st Qu.: 2.0 Class :character Class :character Class :character
## Median : 14.0 Mode :character Mode :character Mode :character
## Mean : 321.9
## 3rd Qu.: 109.0
## Max. :105857.0
## Duration PhotosNumber NumberOfProducts Price
## Min. : 1.00 Min. : 0.00 Min. : 1.000 Min. : 0.0
## 1st Qu.:30.00 1st Qu.: 0.00 1st Qu.: 4.000 1st Qu.: 35.0
## Median :30.00 Median : 4.00 Median : 7.000 Median : 75.0
## Mean :35.65 Mean : 9.09 Mean : 7.367 Mean : 220.1
## 3rd Qu.:41.00 3rd Qu.: 13.00 3rd Qu.:10.000 3rd Qu.: 160.0
## Max. :90.00 Max. :111.00 Max. :64.000 Max. :10000.0
## Updates Gender Created Backed
## Min. : 0.000 Length:6958 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.000 Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 1.000 Mode :character Median : 0.0000 Median : 0.000
## Mean : 6.057 Mean : 0.7305 Mean : 4.127
## 3rd Qu.: 8.000 3rd Qu.: 0.0000 3rd Qu.: 3.000
## Max. :118.000 Max. :34.0000 Max. :749.000
## FbNumber IsVideoAvailable VideoURL VideoLength
## Min. : 0.0 Min. :0.0000 Length:6958 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.:1.0000 Class :character 1st Qu.: 20.0
## Median : 0.0 Median :1.0000 Mode :character Median : 120.0
## Mean : 290.2 Mean :0.7604 Mean : 128.8
## 3rd Qu.: 331.0 3rd Qu.:1.0000 3rd Qu.: 188.0
## Max. :5000.0 Max. :1.0000 Max. :3576.0
## Human Computer Energy Content
## Min. :0.0000 Min. :0.0000 Min. : 0.000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.000
## Median :1.0000 Median :1.0000 Median : 3.160 Median : 0.000
## Mean :0.5961 Mean :0.5296 Mean : 3.789 Mean : 0.205
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.: 5.267 3rd Qu.: 0.000
## Max. :1.0000 Max. :1.0000 Max. :46.824 Max. :24.000
## Upset Angry MaxAmpVol
## Min. : 0.000 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 1.095 Median : 0.0000 Median : 33.59
## Mean : 1.766 Mean : 0.1539 Mean : 36.60
## 3rd Qu.: 2.549 3rd Qu.: 0.0000 3rd Qu.: 51.93
## Max. :19.250 Max. :16.7460 Max. :265.73
Note that because the distributions of Funding Raised and Project Target are highly skewed, we can take the logarithm transformation of the two variables by applying the log function in R. Note that we add “+1” here to avoid the case log(0).
mydata$LogTarget = log(mydata$Target + 1)
mydata$LogFundingRaised = log(mydata$FundingRaised + 1)
Consider the following linear regression: We want to see how the target and the gender of entrepreneurs affect the total funding raised. We use Log Funding as the Depedent Variable, Log Target as the Indepedent Variable, and Gender as a fixed effect (because it is not a value).
result <- lm(LogFundingRaised ~ LogTarget + factor(Gender), data = mydata)
summary(result)
##
## Call:
## lm(formula = LogFundingRaised ~ LogTarget + factor(Gender), data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.7784 -2.5546 0.5721 2.8824 8.4655
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.80035 0.29177 13.025 < 2e-16 ***
## LogTarget 0.24741 0.02629 9.413 < 2e-16 ***
## factor(Gender)M -0.83097 0.16238 -5.117 3.18e-07 ***
## factor(Gender)U 1.42051 0.16469 8.625 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.647 on 6954 degrees of freedom
## Multiple R-squared: 0.09563, Adjusted R-squared: 0.09524
## F-statistic: 245.1 on 3 and 6954 DF, p-value: < 2.2e-16
From the result, we can see that when the target increases, the project is likely to receive more funding. Moreover, compared with females, makes (M) attract less funding while Unknown (U) gender types attract more funding.