300958 Social Web Analytics-Typesetting Program Assessment Answer

  1. Complete the data analysis required by the specification.

Write up your analysis using your favourite word processing/typesetting program, making sure that all of the working is shown and that is it presented well.

identify what the public associates with the company name. The company wants the four pieces of analysis to be performed.

What do the results of the test tell us about the company tweets and random sample of tweets

  1. Compute the proportion of company tweets in each cluster.

What do these results tell us about the tweet topics from the company and public about the company

identify any problems with the analytical process used in each part and how the results may have been effected by these problem (do not include programming problems).

Answer:

The given random sample of tweets was loaded and the top 10 words by fre quencies in the list were obtained by fifirst constructing a document -term Matrix and then ranking the words according their sum in the term-document matrix.We note that the term-document matrix was cleaned of symbols such as , /, k| , numbers, spacesandcommonstopwordsinEnglishlike0 the0 ,0 we0 etc.T heRcodeusedwas :

%load the library "tm"

>library(tm)

%Loading the csv file

>randomSample=read.csv("/home/prajnan/Downloads/randomSample1.csv",header=T)

%Creating a corpus from the file, i.e. data frame

>sam=Corpus(DataframeSource(randomSample))

%define a function that converts words to spaces

>toSpace <- content_transformer(function (x , pattern ) gsub(pattern, "

", x))

%convert | to space

>sam <- tm_map(sam, toSpace, "|")

%convert’/’ to space

>sam <- tm_map(sam, toSpace, "/")

%convert ’@’ to space

> sam <- tm_map(sam, toSpace, "@")

%convert text to lower case

>sam <- tm_map(sam, content_transformer(tolower))

%removing numbers

>sam <- tm_map(sam, removeNumbers)

%removing common stopwords

>sam <- tm_map(sam, removeWords, stopwords("english"))

%removing punctuation marks

>sam <- tm_map(sam, removePunctuation)

%finally removing the white space

>sam <- tm_map(sam, stripWhitespace)

%constructing the term-document matrixfor the random sample

>dtm=TermDocumentMatrix(sam)

%holding the matrix in a variable

1>m=as.matrix(dtm, sparse=TRUE)

%calculating row sums of the matrix

>v <- sort(rowSums(m),decreasing=TRUE)

%construct a table that ranks words according to their frequency ranks

>d <- data.frame(word = names(v),freq=v)

%the first 10 high frequency words

>head(d, 10)

Since the random sample given was a very large one having 6990 rows,therefore we split the given csv fifile into two parts consisting of 3995 rows and perform the analyses on the two parts separately. On inspecting the term document matrix, we fifind that the matrix has a large sparsity index, i.e., most of its entries are zero. This, is why we set ‘sparse=TRUE‘ while we defifine the matrix variable from the term-document matrix.

The output obtained in the two cases are:

for q1(1).png for q1(1).png

Thus, by comparing the two outputs, we can say that the words ‘http, tco,

just, https, new, like, via, get, can and time‘ in order are the top ten ten

words. Thus, the random Sample shows that the 5 most frequently used words

are related to web/internet related common terms and the next 5 terms are

common words used in daily talks.

2for q1(2).png for q1(2).png

The company chosen for the analysis is MercedesBenz. We analyse the the tweets by using the ‘twitteR‘ package. The R program written is:

library(httr)

library(devtools)

library(httk)

library(httpuv)

library(twitteR)

%authorizing with twitter

setup_twitter_oauth("api_key","api_secret", access_token=NULL,access_secret=NULL)

%searching twitter

tw=searchTwitter("MercedesBenz",n=1000,lang="en")

%convert the output to a data frame

rf <- do.call("rbind", lapply(tw, as.data.frame))

%convert the dataframe to a csv file

write.csv(df,file="AboutCompanyTweets.csv")

The vector was formed by taking the row sums and it was merged with the vector obtained in the previous part to form a matrix on which the chi-square test was performed. the R Code used was:

3library(httr)

library(tm)

randomSample=read.csv("/home/prajnan/Downloads/randomSample1.csv",header=T)

sam=Corpus(DataframeSource(randomSample))

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))

sam <- tm_map(sam, toSpace, "|")

sam <- tm_map(sam, toSpace, "/")

sam <- tm_map(sam, toSpace, "@")

sam <- tm_map(sam, content_transformer(tolower))

sam <- tm_map(sam, removeNumbers)

sam <- tm_map(sam, removeWords, stopwords("english"))

sam <- tm_map(sam, removePunctuation)

sam <- tm_map(sam, stripWhitespace)

dtm=TermDocumentMatrix(sam)

m=as.matrix(dtm, sparse=TRUE)

% forming the first vector

v1 <- sort(rowSums(m),decreasing=TRUE)

randomSample=read.csv("/home/prajnan/Downloads/randomSample2016.csv",header=T)

sam=Corpus(DataframeSource(randomSample))

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))

sam <- tm_map(sam, toSpace, "|")

sam <- tm_map(sam, toSpace, "/")

sam <- tm_map(sam, toSpace, "@")

sam <- tm_map(sam, content_transformer(tolower))

sam <- tm_map(sam, removeNumbers)

sam <- tm_map(sam, removeWords, stopwords("english"))

sam <- tm_map(sam, removePunctuation)

sam <- tm_map(sam, stripWhitespace)

dtm=TermDocumentMatrix(sam)

m=as.matrix(dtm, sparse=TRUE)

%forming the second vector

v <- sort(rowSums(m),decreasing=TRUE)

a=read.csv("/home/prajnan/AboutCompanyTweets.csv")

sam=Corpus(DataframeSource(a))

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))

sam <- tm_map(sam, toSpace, "|")

sam <- tm_map(sam, toSpace, "/")

sam <- tm_map(sam, toSpace, "@")

sam <- tm_map(sam, content_transformer(tolower))

sam <- tm_map(sam, removeNumbers)

sam <- tm_map(sam, removeWords, stopwords("english"))

sam <- tm_map(sam, removePunctuation)

sam <- tm_map(sam, stripWhitespace)

sam <- tm_map(sam, stemDocument)

dtm=TermDocumentMatrix(sam)

dtm <- removeSparseTerms(dtm, sparse=0.95)

4m2 <- as.matrix(dtm)

%forming the third vector

w=sort(rowSums(m2),decreasing=TRUE)

%combining first and third vectors

Mat=cbind(v,w)

%combining second and third vectors

Mat1=cbind(v1,w)

%performing the chi-square tests separately

chisq.test(Mat, correct=TRUE)

chisq.test(Mat,correct=TRUE)

The output of the above code is shown in fifigure

square test.png square test.png

square-2.png square-2.png

The p value for the given χ2 statistic in both cases is very low, i.e. less than 2.2 × 10−16, which at the default signifificance levels, leads us to the conclusion that there is no correlation between the tweets about the company with that in the random tweets.

The userTimeline function in R was used to obtain the latest tweets from the company. The tweets were combined with the tweets about the Company ob tained in part 2 and then, the resulting fifile was later clustered according to the k− means algorithm with k = 2. The R code used were:

library(httr)

library(devtools)

library(httk)

library(httpuv)

5library(twitteR)

setup_twitter_oauth("api_key","api_secret", access_token=NULL,access_secret=NULL)

tw1=userTimeline("MercedesBenz",n=1000)

df <- do.call("rbind", lapply(tw1, as.data.frame))

write.csv(df,file="FromCompanyTweets.csv") library(httr)

library(tm)

a=read.csv("/home/prajnan/AboutCompanyTweets.csv")

b=read.csv("/home/prajnan/CompanyTweets.csv")

%combining the tweets from and about the company

d=rbind(a,b)

write.csv(d,file="d.csv")

randomSample=read.csv("/home/prajnan/d.csv",header=T)

sam=Corpus(DataframeSource(randomSample))

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))

sam <- tm_map(sam, toSpace, "|")

sam <- tm_map(sam, toSpace, "/")

sam <- tm_map(sam, toSpace, "@")

sam <- tm_map(sam, content_transformer(tolower))

sam <- tm_map(sam, removeNumbers)

sam <- tm_map(sam, removeWords, stopwords("english"))

sam <- tm_map(sam, removePunctuation)

sam <- tm_map(sam, stripWhitespace)

sam <- tm_map(sam, stemDocument)

dtm=TermDocumentMatrix(sam)

dtm <- removeSparseTerms(dtm, sparse=0.95)

m2 <- as.matrix(dtm)

%transforming the matrix for clustering

m3 <- t(m2)

%setting the seed

set.seed(122)

%setting k

k <- 2

%performing k-means clustering

kmeansResult <- kmeans(m3, k)

%getting the output

round(kmeansResult$centers, digits = 3)

The result of the clustering is shown in Figure

fr part 3.png fr part 3.png

The clustering results state that ‘fals‘ and ‘true‘ are the main words which have an average frequency of 4 and 0 ; and 3 and 1 respectively.

The results above point out the fact that the company MercedesBenz is as such not that popular among the general public and that there are very few words, that the company and public mainly speak about commonly.

References

[1]Cluster Analysis using term frequencies-Available at https://r.789695.n4.nabble.com/Cluster

analysis-using-term-frequencies-td4705033.html(Accessed:08/10/2017)

[2]Chi-Square Goodness of fifit test-Available at https://stattrek.com/chi-square

test/goodness-of-fifit.aspx?Tutorial=AP(Accessed:08/10/2017)

[3]R-Companion,Chi-square test of Independence-Available at https://rcompanion.org/rcompanion/b05.html(Accessed :

08/10/2017)

[4]Report-1:Introduction to k-means clustering with twitter data-Available at

https://rstudio-pubs-static.s3.amazonaws.com/5983af66eca6775f4528a72b8e243a6ecf2d.html(Accessed :

08/10/2017)

[5]R-Data Mining.com-R and Data Mining,Twitter Data Analysis with R

Available at https://www.rdatamining.com/docs/twitter-analysis-with-r(Accessed:08/10/2017)

[6]STHDA-Wiki,Text mining and word cloud fundamentals with R: 5 simple

steps you should know-Available at https://www.sthda.com/english/wiki/text

mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know(Accessed:08/10/2017)

[7]StackOverflflow, twitteR authentication with R:error 401 -Available at https://stackoverflflow.com/questions/29504484/twitter

package-for-r-authentication-error-401(Accessed:06/10/2017)


Buy 300958 Social Web Analytics-Typesetting Program Assessment Answers Online

Talk to our expert to get the help with 300958 Social Web Analytics-Typesetting Program Assessment Answers from Assignment Hippo Experts to complete your assessment on time and boost your grades now

The main aim/motive of the finance assignment help services is to get connect with a greater number of students, and effectively help, and support them in getting completing their assignments the students also get find this a wonderful opportunity where they could effectively learn more about their topics, as the experts also have the best team members with them in which all the members effectively support each other to get complete their diploma assignment help Australia. They complete the assessments of the students in an appropriate manner and deliver them back to the students before the due date of the assignment so that the students could timely submit this, and can score higher marks. The experts of the assignment help services at www.assignmenthippo.com are so much skilled, capable, talented, and experienced in their field and use our best and free Citation Generator and cite your writing assignments, so, for this, they can effectively write the best economics assignment help services.

Get Online Support for 300958 Social Web Analytics-Typesetting Program Assessment Answer Assignment Help Online

Want to order fresh copy of the Sample 300958 Social Web Analytics-Typesetting Program Assessment Answers? online or do you need the old solutions for Sample 300958 Social Web Analytics-Typesetting Program Assessment Answer, contact our customer support or talk to us to get the answers of it.

Assignment Help Australia
Want latest solution of this assignment

Want to order fresh copy of the 300958 Social Web Analytics-Typesetting Program Assessment Answers? online or do you need the old solutions for Sample 300958 Social Web Analytics-Typesetting Program Assessment Answer, contact our customer support or talk to us to get the answers of it.