0

I need faster way to parse XML to array (without empty values).

Till now I was parsing XML to array using Array2XML (by Lalit Patel) library, but it was bottleneck to script. I was looking to speed up it and found about 15x faster way:

class SimpleXmlDecoder
{

    public function decode(string $xml): array
    {
        try {
            $decoded = json_decode(json_encode(
                simplexml_load_string($xml, "SimpleXMLElement", LIBXML_NOCDATA)
            ),TRUE);

            if (empty($decoded)) {
                return [];
            }

            return self::mapEmptyArraysElementsToEmptyString($decoded);
        } catch (\Exception $exception) {
            return [];
        }
    }

    private static function mapEmptyArraysElementsToEmptyString($array): array
    {
        return array_map(
            static function($value) {
                if (!is_array($value)) {
                    return $value;
                }

                if (empty($value)) {
                    return '';
                }

                return self::mapEmptyArraysElementsToEmptyString($value);
            },
            $array
        );
    }

}

It is enough now, but can be bottleneck in future. Do you know faster way to do it?

@Edit Size of every XML: 100kB-1MB Need return values from ALL NON-EMPTY elements with name and value.

2
  • What is the requirement for processing the XML file - simply return vales from ALL elements? Return name and value? Return all attributes? How large are these XML files? Commented Jun 7, 2019 at 11:01
  • Why do you need it as an array? You would probably be better processing the XML directly. Commented Jun 7, 2019 at 17:24

3 Answers 3

1

I just quickly cobbled together the xmlparser class below which uses the RecursiveDOMIterator class to process an xml file. Whether this will be faster than your original code I do not know - it seems fairly brisk when processing files locally - it managed to work through a very complex 8Mb xml file in 2.4s but zips through smaller files. I'd be interested to know how it performs in comparison

<?php

    class RecursiveDOMIterator implements RecursiveIterator {
        /*
            https://github.com/salathe/spl-examples/wiki/RecursiveDOMIterator
        */
        private $index;
        private $list;

        public function __construct(DOMNode $domNode){
            $this->index = 0;
            $this->list = $domNode->childNodes;
        }
        public function current(){
            return $this->list->item($this->index);
        }
        public function getChildren(){
            return new self( $this->current() );
        }
        public function hasChildren(){
            return $this->current()->hasChildNodes();
        }
        public function key(){
            return $this->index;
        }
        public function next(){
            $this->index++;
        }
        public function rewind(){
            $this->index = 0;
        }
        public function valid(){
            return $this->index < $this->list->length;
        }
    }//end class


    class xmlparser{
        private static $instance=false;
        private $start;
        private $dom;

        private function __construct( $xml ){
            $this->start=microtime( true );
            libxml_use_internal_errors( true );
            $this->dom=new DOMDocument;
            $this->dom->validateOnParse=true;
            $this->dom->recover=true;
            $this->dom->strictErrorChecking=true;

            if( is_file( $xml ) && file_exists( $xml ) ) $this->dom->load( $xml );
            else $this->dom->loadXML( $xml );

            libxml_clear_errors();
        }

        private function __clone(){}
        public function __wakeup(){}
        public static function initialise( $xml ){
            if( !self::$instance ) self::$instance=new xmlparser( $xml );
            return self::$instance;
        }

        public function parse(){
            $itr = new RecursiveIteratorIterator( new RecursiveDOMIterator( $this->dom ), RecursiveIteratorIterator::SELF_FIRST );
            $tmp=[];
            foreach( $itr as $node) {
                if( $node->nodeType === XML_ELEMENT_NODE ) {

                    $tag=$node->tagName;
                    $value=$node->nodeValue;

                    if( !empty( $value ) ){
                        $element=[
                            'tag'   =>  $tag,
                            'value' =>  $value
                        ];
                        if( $node->hasAttributes() ){
                            $attributes=[];
                            foreach( $node->attributes as $index => $attr ){
                                $attributes[ $attr->nodeName ]=$attr->nodeValue;
                            }
                            $element['attributes']=$attributes;
                        }
                        $tmp[]=$element;
                    }
                }
            }
            $this->duration=microtime( true ) - $this->start;
            return $tmp;
        }
        public function __get( $name ){
            return $this->$name;
        }
    }//end class



    $file = 'bbc_rss.xml';
    $obj = xmlparser::initialise( $file );
    $data = $obj->parse();
    $time = $obj->duration;
    $size = round( $obj->filesize/pow( 1024, 2 ),2 );



    printf( "Time: %s\nSize:%sMb", $time, $size );

?>
Sign up to request clarification or add additional context in comments.

Comments

0

You can use simplexml_load_string() function to parse xml. Please refer "https://www.w3schools.com/php/php_xml_simplexml_read.asp" to learn more about it.

3 Comments

It returns SimpleXMLElements not arrays.
Please go through link mentioned in answer. Here it is working perfect.
Link only answers are potentially useless, if the page linked to changes or is removed then the answer contains no useful content for future readers.
0

You can use the SAX parser, using this you can parse huge files.

SAX parser has used to parse the XML file and better for memory management than sample XML parser and DOM. It does not keep any data in memory so it can be used for very large files. Following example will show how to get data from XML by using SAX API.

Link

1 Comment

Link only answers are potentially useless, if the page linked to changes or is removed then the answer contains no useful content for future readers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.