Scraper

Please install the rvest package first.

library(rvest)

## Warning: package 'rvest' was built under R version 4.1.2

Get the HTML source file from the webpage:

url = "https://www.fbe.hku.hk/people/faculty?pg=1&staff_type=faculty&subject_area=marketing&track=all"
webpage = read_html(url, encoding = "UTF-8")
print(webpage)

## {html_document}
## <html lang="en-US" prefix="og: https://ogp.me/ns#">
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
## [2] <body class="page-template page-template-people-listing page-template-peo ...

Find and print the text information:

nodes <- html_nodes(webpage,xpath = '//div[@class="h5"]')
for (node in nodes)
  print(html_text(node))

## [1] "Dr. Jingcun CAO"
## [1] "Mr. Baniel CHEUNG"
## [1] "Dr. Buston Yat Chiu CHU"
## [1] "Dr. Chu (Ivy) DANG"
## [1] "Dr. Jinzhao DU"
## [1] "Dr. Tingting FAN"
## [1] "Dr. Tak Zhongqiang HUANG"
## [1] "Dr. Jayson Shi JIA"
## [1] "Dr. Michael He JIA"
## [1] "Prof. Sara KIM"
## [1] "Dr. Bernard LEE"
## [1] "Dr. Xi LI"
## [1] "Dr. Yin Mei NG"
## [1] "Dr. Tuan Quang PHAN"
## [1] "Mr. Sean RACH"
## [1] "Prof. Echo Wen WAN"

Scraper

Xi Li

12/30/2021