The result shows that President Kaler used all of the appropriate terms one would expect in a State of the University Address. The terms "students", "faculty, and "research" are all prominent, as are "budget", "tuition", balancing", "support" and "learning" and other administrative catch-all words.
The code I used was
library(tm)
library(wordcloud)
## Read in the data from a folder which contains the text document(s)
(ovid <- Corpus(DirSource("/Users/andrewz/Documents/Data/State-of-the-University/"),
readerControl = list(reader = readPlain)))
## Document preparation
sotu <- tm_map(ovid, removePunctuation, preserve_intra_word_dashes = TRUE)
sotu <- tm_map(sotu, removeNumbers)
sotu <- tm_map(sotu, tolower)
sotu <- tm_map(sotu, stripWhitespace)
sotu <- tm_map(sotu, removeWords, stopwords("english"))
sotu <- tm_map(sotu, stripWhitespace)
## Create document-term matrix
tdm <- DocumentTermMatrix(sotu)
m <- as.matrix(tdm)
v <- sort(colSums(m),decreasing = TRUE)
d <- data.frame(word = names(v), freq = v)
## Plot
pal <- brewer.pal(7, "Set3")
pdf("/Users/andrewz/Desktop/SOTU.pdf", width = 8.33, height = 6.67, bg = "black")
wordcloud(d$word,d$freq,
#scale=c(8, 0.3),
min.freq = 3,
#max.words = 100,
#random.order = TRUE,
rot.per = 0.15,
colors = pal,
vfont=c("sans serif","plain")
)
dev.off()
The second word cloud is based on my Google Scholar page. The cloud on the left-hand side shows my co-authors (sized by most frequent) and the cloud on the right-hand side shows terms that show up in the work linked to my Scholar page.
The summary citation info can also be output in R. Mine is
Total papers = 20
Median citations per paper = 1.5
Median (citations / # of authors) per paper = 0.4166667
H-index = 6
G-index = 9
M-index = 1
First author H-index = 4
Last author H-index = 2
First or last author H-index = 5
First or second author H-index = 5
The code is below
source("http://biostat.jhsph.edu/~jleek/code/googleCite.r")
out <- googleCite("http://scholar.google.com/citations?user=cWpN_s8AAAAJ&hl=en",
pdfname = "/Users/andrewz/Desktop/Zieffler_wordcloud.pdf")
gcSummary(out)


No comments:
Post a Comment