SUBMISSION REQUIREMENTS: Please resign a singly R script polish denominated delay your “First_Last Name.R” ONLY. Your R script command must calculate the effectivness of your designation as vivid beneath.
Similar to the designation sample. process and adjust the newsgroup muniment facts. Download this facts and economize it on your computer in your R packages folder below "tm/text/". Your command MUST access it from there!
Note that the facts is disconnected into one trial and one course folder, each containing 20 sub folders on unanalogous questions. Choose these 2 questions to awaken (sci.boundlessness and rec.autos) and 100 muniments from each.
Consider "rec.autos" as substantial and "sci.space" as indirect fact. Note that kNN syntax expects (Positive First, Indirect assist)
Classify the Newsgroups facts (by continuance rendering facts set) from Blackboard:
• Save facts in your "tm/text/" folder so you can mention route using system.file()
• Note that the facts is disconnected into one trial and one course folder, each containing 20 sub folders on unanalogous questions.
Choose these 2 questions to awaken (sci.boundlessness and rec.autos) and 100 muniments from each.
• For each question selected:
– 100 muniments for courseing from the course folder
– 100 muniments for trialing from the trial folder
• Obtain the merged Corpus (of 400 muniments), delight continue the direct as
– Doc1.Train from the "sci.space" newsgroup course facts
– Doc1.Test from the "sci.space" newsgroup trial facts
– Doc2.Train from the " rec.autos" newsgroup course facts
– Doc2.Test from the " rec.autos" newsgroup trial facts
• Implement preprocessing (plainly show what you accept used)
• Create the Document-Term Matrix using the aftercited arguments (term lengths of at last 2, term number of at last 5)
– use: control=list(wordLengths=c(2,Inf), bounds=list(global=c(5,Inf)))
• Split the Document-Term Matrix into just trial/course row arranges
– course arrange containing rows (1:100) and (201:300)
– test range containing rows (101:200) and (301:400)
– Note that knn expects the substantial ("Rec") fact as highest, so re-adjust your course/trial arrange if needful.
• Use the abbreviations "Positive" and "Negative" as tag factors in your designation.
– Check if the tag direct is emend using consultation(Tags)
– You should get
• Substantial Negative
• 100 100
– If your direct is not exact bring-about just changes.
• Classify quotation using the kNN() function
• Display designation results as a R dataframe and indicate the columns as:
– "Predict" - Tag factors of predicted question (Positive or Negative)
– "Prob" - The designation probability
– "Correct' - TRUE/FALSE
• What is the percentage of emend (TRUE) designations?
• Estimate the usefulness of your designation:
– Calculate and plainly symptom the values TP, TN, FP, FN
– Create the indistinctness matrix and indicate the rows and columns delay what is Positive/Negative fact
– Calculate Precision
– Calculate Recall
– Calculate F-score
Note that one way you can selecteded singly 100 muniments is
> Temp1 <- DirSource(Doc1.TestPath)
> Doc1.Train <- Corpus(URISource(Temp1$filelist[1:100]),readerControl=list(reader=readPlain))