Store a large PHP array, plain php, or gzipped serialized data?

Question

I have a static large array-of-arrays, witch represents tree structure with about ~100k nodes, this array is only read-only reference for value lockups.

Now, witch of this methods will perform better?

First, simply array definition in pure PHP file, for including in requests

Second, serialize this array, gzip serialized output, and load gzipped file and unserialize for every request

Or convert array to SQLite or somethin' similar, but storage must be capable of fast lockup of long "ID path" ie. 1->5556->123->45->455->Node_name (With actually PHP table doing very good)

Memory limit on server is not a problem

Just benchmark it and tell us. And consider memcached as an additional option. — Gordon
– Gordon, Commented Sep 15, 2010 at 22:50
Why would you gzip it if memory is not a concern? That's just going to take more processing time. — davidtbernal
– davidtbernal, Commented Sep 15, 2010 at 22:53
Memcached, apc, & read it into memory from a plain file if it isn't there anymore (server restarts & the like). — Wrikken
– Wrikken, Commented Sep 15, 2010 at 22:57
because gzipping will produce very less file size (as serialized text has very good compress ratio) and this will shorten up disk read operations — canni
– canni, Commented Sep 15, 2010 at 22:57
if the file is accessed a lot, it will under normal circumstances already be cached in memory, preventing the need for disk IO. — Wrikken
– Wrikken, Commented Sep 15, 2010 at 22:59

Evert · Accepted Answer · 2010-09-15 22:57:01Z

You'll need to turn the array into a PHP value a some point anyway, so gzip is out.

So if you are going to decide between keeping it on disk using something like sqlite, or just let php load it in every time (preferably having APC enabled), the real question is whats more important to you, memory or CPU. If you don't know yet, you're probably suffering from a case of premature optimization.

When it does become relevant to you to either cut down on memory or cpu, (or io) the answer will be more obvious, so make sure you can easily refactor.

If you want to predict what's better for you, do a benchmark.

Update I just saw memory is apparently not a concern. Go for the PHP array and include the file. Easy. Keep in mind though that if the total data size is 10MB, this will be 10MB per apache process. At a 100 apache processes this is already 1GB.

Michael · Accepted Answer · 2011-09-09 04:34:14Z

1

those load times actually look pretty good considering you are loading the whole file on every request. gzip may help (by reducing the amount of data it has to read from disk)

if you want it faster or if that file is likely to get much bigger perhaps consider looking into ways to extract what you want without loading the whole file - pointers to tree nodes/fseek/hashtables/etc.

If you can work out a way to fseek to the data you actually want and just load that rather than loading the whole file that would speed things up a lot.

answered Sep 9, 2011 at 4:34

Michael

111 bronze badge

Comments

canni · Accepted Answer · 2010-09-16 08:12:22Z

0

My benchmarks are telling everything:

Pure PHP file size: ~5 MB

Pure PHP file import: avg load time: ~210 ms
Pure PHP file import with APC: avg: ~60-80 ms
Unserialize(gzuncompress(file_get_contents()) : almost const, every request: ~ 40 ms

SQlite i've not tested, because lack of time, to port this tree structure into SQL DB

~host is on 4 core Intel Xeon with HT, and hardware RAID 5

answered Sep 16, 2010 at 8:12

canni

5,9159 gold badges50 silver badges70 bronze badges

Collectives™ on Stack Overflow

Store a large PHP array, plain php, or gzipped serialized data?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related