0

I was automating remember the milk tasks with zapier which trigger if anything changes in atom feed. The problem is zapier sends xhtml formatted data in plain text which I am catching using php://input

<?php
$xhtml = file_get_contents('php://input');
?>

The raw data looks like this:

@class: rtm_due
span: [{'#text': 'Due:', '@class': 'rtm_due_title'}, {'#text': 'Sat 16 Jul 16', '@class': 'rtm_due_value'}]

@class: rtm_priority
span: [{'#text': 'Priority:', '@class': 'rtm_priority_title'}, {'#text': '1', '@class': 'rtm_priority_value'}]

@class: rtm_tags
span: [{'#text': 'Tags:', '@class': 'rtm_tags_title'}, {'#text': 'gcal-work, github', '@class': 'rtm_tags_value'}]

@class: rtm_location
span: [{'#text': 'Location:', '@class': 'rtm_location_title'}, {'#text': 'none', '@class': 'rtm_location_value'}]

@class: rtm_list
span: [{'#text': 'List:', '@class': 'rtm_list_title'}, {'#text': 'Work', '@class': 'rtm_list_value'}]

Lets say I want to extract the due-date Sat 16 Jul 16 under @class: rtm_due; How can I extract this? Will regex (preg_match) be any help? If so how?

2 Answers 2

1

Perhaps you may want to do this in a twisted & convoluted fashion (ie. within a function that uses Regex and a Looping Construct to fetch the data you need). Consider this Function below. It is worth noting that though it may appear twisted & convoluted, you are not limited to just getting the Date value. This means you also have access to all the key-value Pairs in that file: in case you need to at some point...

    <?php
        $file   = __DIR__ . "/file.txt";   //<== THE NAME OF THE FILE CONTAINING YOUR DATA


        /*************** BEGIN FUNCTIONS ***************/
        function parseFile($file){
            $arrFileContent    = [];

            // IF THE FILE DOES NOT EXIST RETURN NULL 
            if(!file_exists($file)){
                return null;
            }
            // GET THE DATA FROM THE FILE & STORE IT IN A VARIABLE
            $strFileDataContent = file_get_contents($file);

            // IF THE FILE CONTAINS NOTHING RETURN NULL AS WELL  
            if(empty($strFileDataContent)){
                return null;
            }

            // SPLIT THE CONTENTS OF THE FILE (STRING) AT THE END OF EACH LINE
            // THUS CREATING AN ARRAY OF LINES OF TEXT-DATA
            $arrFileDataLines   = explode("\n", $strFileDataContent);

            // LOOP THROUGH THE ARRAY PRODUCED ABOVE & PERFORM SOME PATTERN MATCHING
            // AND TEXT EXTRACTION WITHIN THE LOOP

            foreach($arrFileDataLines as $iKey=>$lineData){
                $arrSubLines   = explode("\n", $lineData);

                foreach($arrSubLines as $intKey=>$strKeyInfo){
                    $rxClass    = "#(^@class:)(\s*)(.*$)#i";
                    $rxSpan     = "#(^span:)(\s*)?(.+$)#si";

                    preg_match($rxClass, $strKeyInfo, $matches);
                    preg_match($rxSpan,  $strKeyInfo, $matches2);

                    if($matches) {
                        list(, $key, $null, $val) = $matches;
                        $keyA   = str_replace("rtm_", "", $val);
                        if (!array_key_exists($keyA, $arrFileContent)) {
                            $arrFileContent[$keyA] = $val;
                        }
                    }
                    if($matches2) {
                        list(, $key2, $null, $val2) = $matches2;
                        $keyB   = $keyA ."Data";
                        if (!array_key_exists($keyB, $arrFileContent)) {
                            $arrFileContent[$keyB] = parseSpanValues($val2, str_replace("rtm_", "", $keyA));
                        }
                    }
                }
            }
            return $arrFileContent;
        }

        function parseSpanValues($spanData, $prefix){
            $arrSpanData    = explode(", ",  preg_replace("#[\{\}\[\]\"\'\#\@]#", "", $spanData));
            $objSpanData    = new stdClass();
            $cleanVal       = "";

            if($prefix == "tags"){
                $cnt = 0;
                foreach($arrSpanData as $i=>$val){
                    if(!stristr($val, ":")){
                        $cleanVal  .= ", " . $val ;
                        $cnt++;
                    }
                }
                $arrSpanData[2] = $arrSpanData[2] . $cleanVal;
                array_splice($arrSpanData, 3, $cnt);
            }

            foreach($arrSpanData as $iKey=>&$spanVal){
                $arrSplit   = preg_split("#\:\s#", $cleanVal . $spanVal);
                $key        = "text";

                if($iKey == 0){
                    $key    = "{$prefix}Text";
                }else if($iKey == 1){
                    $key    = "{$prefix}TextClass";
                }else if($iKey == 2){
                    $key    = "{$prefix}Value";
                }else if($iKey == 3){
                    $key    = "{$prefix}ValueClass";
                }
                if(isset($arrSplit[1])){
                    $objSpanData->$key  = $arrSplit[1];
                }
            }
            return $objSpanData;
        }
        /*************** END OF FUNCTIONS ***************/



        var_dump(parseFile($file));
        // PRODUCES SOMETHING LIKE: 
        array (size=10)
          'due' => string 'rtm_due' (length=7)
          'dueData' => 
            object(stdClass)[1]
              public 'dueText' => string 'Due' (length=3)
              public 'dueTextClass' => string 'rtm_due_title' (length=13)
              public 'dueValue' => string 'Sat 16 Jul 16' (length=13)
              public 'dueValueClass' => string 'rtm_due_value' (length=13)
          'priority' => string 'rtm_priority' (length=12)
          'priorityData' => 
            object(stdClass)[2]
              public 'priorityText' => string 'Priority' (length=8)
              public 'priorityTextClass' => string 'rtm_priority_title' (length=18)
              public 'priorityValue' => string '1' (length=1)
              public 'priorityValueClass' => string 'rtm_priority_value' (length=18)
          'tags' => string 'rtm_tags' (length=8)
          'tagsData' => 
            object(stdClass)[3]
              public 'tagsText' => string 'Tags' (length=4)
              public 'tagsTextClass' => string 'rtm_tags_title' (length=14)
              public 'tagsValue' => string 'gcal-work, github, stack-overflow' (length=33)
              public 'text' => string 'rtm_tags_value' (length=14)
          'location' => string 'rtm_location' (length=12)
          'locationData' => 
            object(stdClass)[4]
              public 'locationText' => string 'Location' (length=8)
              public 'locationTextClass' => string 'rtm_location_title' (length=18)
              public 'locationValue' => string 'none' (length=4)
              public 'locationValueClass' => string 'rtm_location_value' (length=18)
          'list' => string 'rtm_list' (length=8)
          'listData' => 
            object(stdClass)[5]
              public 'listText' => string 'List' (length=4)
              public 'listTextClass' => string 'rtm_list_title' (length=14)
              public 'listValue' => string 'Work' (length=4)
              public 'listValueClass' => string 'rtm_list_value' (length=14)

So as it is right now, if you wanted to get the date for the first instance in the Array [Element dueData], you can simply do something like this:

    <? php
        $data          = parseFile($file);  
        $dateDateValue = $data['dueData']->dueValue;        

        var_dump($dateDateValue);  // PRODUCES:: 'Sat 16 Jul 16'

Hope this attempts (at all) to give you a vague idea on how to improvise on your own.

Cheers & Good Luck!!!

Sign up to request clarification or add additional context in comments.

3 Comments

Wow!...Awesome. This one is even better. Thanks. :)
The only problem remains is to get the tags. It has comma-separated multiple values(i.e gcal-work, github). Above code is only giving me first value.May be I can explode them into an array and then parse with foreach?
@Khurshid Alam The issue has been fixed and the Post updated. You only need to copy the 2nd Function: function parseSpanValues($spanData, $prefix){...} That's where the fix was made. Again, Cheers & Good-Luck... ;-)
0

I think this regex will work :

@class:\s*rtm_due\nspan:\s*\[{.*}, {'#text':\s*(.*),\s*'@class':\s*'rtm_due_value'}]

Demo here but for only due date

If you want location you need to chagne regex to :

@class:\s*rtm_location\nspan:\s*\[{.*}, {'#text':(.*),\s*'@class':\s*'rtm_location_value'}]

the group 1 should give the desired value.

This is the output I got in one of the php regex testers available online :

    [0] => @class: rtm_due
span: [{'#text': 'Due:', '@class': 'rtm_due_title'}, {'#text': 'Sat 16 Jul 16', '@class': 'rtm_due_value'}]
    [1] => 'Sat 16 Jul 16'

1 Comment

Thanks. I checked it works! However the answer given by Poiz is even better. Hence I am accepting that as answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.