java - How to extract body from a text file containing an e-mail [Enron Data Set] -


I have the Enron e-mail data set as a folder, which contains e-mail as text files , And I want to remove the "body" portion of those e-mails

The problem is, the field like sender's email, the receiver's email is specified by copy: by:, from: etc. But the body does not start with any title, it now starts after all other fields are specified.

Now, in a text file, there may be several bodies (in the case of email threads / conversations). I want to remove body (AIS) from these files, whether Java can be used in AP, if yes , then how? It's just the offline data set, not on the internet as text files in my hard disk drive.

The file is such -

  message-id: & lt; 1615 9 836.1075855377439.javaamale.evens @thayme & gt; Date: Friday, December 7, 2001 10:06:42 -0800 (PST) From: heather.dunton@enron.com To: k.allen @ enron.com Subject: RE: West position Mime-version: 1.0 Content-type: Text / plain; Charset = us-ascii Content-Transfer-Encoding: 7 bit X-From: Dunnton, Heath & lt; / O = ENRON / OU = NA / CN = RECIPIENTS / CN = HDUNTON> X-To: Alan, Philippe's & Lt; / O = ENRON / OU = NA / CN = Retypinus / CN = Pellan & gt; X-CC: X-BCC: X-Folded: \ Philip_Alan_Jan2002_1 \ Allen, Philip K. Inbox X-Origin: Allen-PX-filename: Pallain (non-private). Pt. Please tell me if you still need a curve to slip off. Thanks, Heather ----- Original Message ----- From: Allen, Philip K. Sent: Friday, December 07, 2001 5:14 am: Dunnton, Heather Subject: RE: The position of the West Heath, did you add the file to this email? ----- Original Message ----- From: Dunnton, Heath Posted: Wednesday, December 05, 2001 1:43 pm From: Allen, Philip K.; Belden, Tim Subject: FW: The West position is attached to 1/16, 1/30, 6/19, 7/13, 9/21 Delta Status ----- Original Message ----- From: Allen, Philip was sent: Wednesday, December 05, 2001 6:41 AM: Dunnton, Heather Subject: RE: The position of the West Heath, this is what we need. Is it possible to add the first day for each date given below in the pivot table, we also need the pre-dates to end the posts to verify curve changes on the dates below. Thanks, Philip Allen ----- Original Message ----- From: Dunnton, Heather Sent: Tuesday, December 04, 2001 3:12 pm From: Belden, Tim; Allen, Philip K. C. .: Driscell, Michael M. Subject: Position of West is enclosed 1/18, 1/31, 6/20, 7/16, 9/24 and LT; & Lt; File: west_delta_pos.xls & gt; & Gt; Let me know if you have any questions. Heath      

Please provide an example file, the most More complex, if possible, open the programmatic file for each job, parsing its content and removing the contents of the email. Where do you want to store it? Which OS are you running?

Comments

Popular posts from this blog

php - PDO bindParam() fatal error -

logging - How can I log both the Request.InputStream and Response.OutputStream traffic in my ASP.NET MVC3 Application for specific Actions? -

java - Why my included JSP file won't get processed correctly? -