import - R - Text Mining - Importing a Corpus and keeping the file names in document term matrix -


The code shown below allows me to import a series of .txt documents stored in the local area (1 month ago)

Had given R in the folder, to make a corpus, to pre-process it and finally convert it to a document term matrix. The problem I have is that document names are not being imported, instead each document is listed as 'character (0)'

One of my goals is to model the topic on the corpus And therefore it is important that I can relate model names to those models which produce the model.

Is anyone suggesting what has changed? Library ("TM") Library ("Snowball Sea") Setdodi ("C: / User / Document / Dataset /") Corpus & lt; DirSource ("Blog")) MyStopwords & lt #pre_processing; - c (stopwords ("english")) your_corpus & lt; - tm_map (corpus, tolower) your_corpus & lt; - tm_map (your_corpus, removeNumbers) your_corpus & lt; - tm_map (your_corpus, removeWords, MyStopwords) your_corpus & lt; - tm_map (your_corpus, stripWhitespace) your_corpus & lt; - tm_map (your_corpus, delete delete) your_corpus & lt; - tm_map (your_corpus, stemDocument) your_corpus & lt; - tm_map (your_corpus, plaintextdocument) #doucment word matrix myDtm & lt; - DocumentTermMatrix (your_corpus, control = list (wordLengths = c (3, inf) Inspection of myDtm (myDtm)

Here is a debugging session to identify / correct the loss of the file name. The tool line was modified, and the plain text line was commented because these lines provided file information Also, if you check the DS $ reader, you can see that the baseline reader is a plain text document. Avejh makes.

  library ( "TM") Library ( "Snowball C") # corpus & lt; -cropas DirSource ( "blog")) SF & LT; -system.file ("texts", "txt", package = "TM") DS's & lt; -DirSource (SF) your_corpus & lt; -Corpus (DS) with # Check status of the status queue (your_corpus [[1]] do myStopwords & lt #pre_processing; - c (stopwords ("english")) # your_corpus & lt; - tm_map (your_corpus, tolower) your_corpus & lt; - tm_map (your_copus, content_transformer (tolower)) meta (your_corpus [[1]] your_corpus & lt; - tm_map (your_copus, removeNumbers) meta (your_corpus [[1]] your_corpus & lt; - tm_map (your_corpus, removeWords, myStopwords) meta (your_corpus [[1]] your_corpus & lt; - tm_map (your_corpus, stripWhitespace) Meta (your_corpus [[1]] your_corpus & lt; - tm_map (your_corpus, removePunctuation) Meta (your_corpus [[1]] your_corpus & lt; - tm_map (your_corpus, stemDocument) meta (your_corpus [[1]] #your_corpus & lt; - tm_map (your_corpus, PlainTextDocument) #M eta (your_corpus [[1]] #creating a doucment period matrix myDtm & lt; - DocumentTermMatrix (your_corpus, control = list (wordLengths = c (3, inf)) Inspection of myDtm (myDtm)    

Comments

Popular posts from this blog

php - PDO bindParam() fatal error -

logging - How can I log both the Request.InputStream and Response.OutputStream traffic in my ASP.NET MVC3 Application for specific Actions? -

java - Why my included JSP file won't get processed correctly? -