3

I need to parse a plaintext file that looks like a log:

11/04/2015 11:45:01: James: Cheers guys, enjoy the weekend!

11/04/2015 12:08:55: Sarah‬: Sounds good James

11/04/2015 12:09:24: ‪Sarah: What are the details of the trip?

11/04/2015 12:19:06: Leah: Driving up on Friday.
Saturday we'll hit the beach.
Sunday paaaaarty!

11/04/2015 12:29:54: ‪James: Nice.

I'm currently parsing by line break:

var messages = data.split('\n');

But this doesn't work where a message contains a line break (see Leah's message above).

What would be the proper way to parse each new entry? Some kind of regular expression date/time match? or Some Regular Expression which parses date as mentioned above ?

Grateful your help.

3
  • 2
    you might want \n\n or more complex check if message contains this sequence too Commented Aug 31, 2015 at 10:00
  • Before anyone can help you, you need to define what you would like to do if a multiline message contains a pattern that matches a full entry. This happens often in IM-type messages when users copy and send log fragments in a message. Depending on the answer, it's likely that Hacketo is right in that you don't have enough structure as-is. Commented Aug 31, 2015 at 10:18
  • \n\n is a double line break and that can appear quite often in IM logs so that won't be strong enough. But matching on double line break followed by date and then time would be sufficient, if I can get the regex for that. Commented Aug 31, 2015 at 10:25

2 Answers 2

2

I think what you can try here is -

If each line stats with a date format then take later part of it as on string till it ends with the another date format.

Dont split using \n instead use the date that is in mm/dd/yyyy hh:mm:ss: format .

Logic needs to applied for below type because your text is in this type as mentioned below--

Date Format starts >> content << Date Format Ends

Make your own Regular Expression using this guide . http://www.w3schools.com/jsref/jsref_obj_regexp.asp

Try this Regular Expression to split  /[0-9]+\/[0-9]+\/[0-9]* [0-9]*\:[0-9]*\:[0-9]*\:/g



 var re = /[0-9]+\/[0-9]+\/[0-9]* [0-9]*\:[0-9]*\:[0-9]*\:/g; 
var str = '11/04/2015 11:45:01: James: Cheers guys, enjoy the weekend!\n\n11/04/2015 12:08:55: Sarah‬: Sounds good James\n\n11/04/2015 12:09:24: ‪Sarah: What are the details of the trip?\n\n11/04/2015 12:19:06: Leah: Driving up on Friday.\nSaturday we\'ll hit the beach.\nSunday paaaaarty!\n\n11/04/2015 12:29:54: ‪James: Nice.';
var m;
 
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    // View your result using the m-variable.
    // eg m[0] etc.
}
Sign up to request clarification or add additional context in comments.

2 Comments

actually a combination of both is what I'm looking for. Double line break followed by date followed by time. Any ideas?
OK I did it using the x(?=y) format (see developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/…), like this: var messages = data.split(/\n(?=[0-9]+\/[0-9]+\/[0-9]* [0-9]*\:[0-9]*\:[0-9]*\:)/g); to split by line breaks only if followed by the date-time-stamp. Awarding this answer since it was the closest.
1

I think you can use a regex like this:

/^[\d\/ :]+:[^:]+:(.*)|(.*)$/gm

Then you can use its substitutions: $1 and $2

[Regex Demo]

1 Comment

I tried var messages=data.split(/^[\d\/ :]+:[^:]+:(.*)$|(.*)$/); but that didn't work. Also, the matches in your regex demo are removing the people's names (e.g. James)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.