PHP processing time and CPU usage with million rows comparing


April 2019


30 time


I have a tool witch compare one string with, on average - 250k strings from database. Two tables are used during compare process - categories and categories_strings. In string table there is around 2.5 million rows while pivot - categories_string contains of 7 million rows.

My query is pretty simple, selecting strings columns, joining pivot table, adding where clause to specify category and setting limit of 10 000. I run this query in a loop, every batch is 10 000 strings. To execute whole script faster I use Seek Method instead of MySQL offset which was a way too slow on huge offsets. Then, comparing by common algorithms such us simple text, levenshtein etc. is perfomed on each batch. This part is simple.

The question starts here.

On my laptop (lenovo x230) whole process for i.e. 250k string compared takes: 7,4 seconds to load SQL, 13,3 seconds to compare all rows. And then 0,1 second sorting and transforming for view.

I've also small dedicated server. Same PHP version, same MySQL. Web server doesn't matter, as I run it from command line right now. As on my laptop it takes +- 20 seconds in total, on the server it is... 120 seconds.

So, what is the most important factor for a long running PHP program which have impact on execution time? All I can think of is CPU, which on the dedicated server is worse, it is Intel(R) Atom(TM) CPU N2800 @ 1.86GHz. Memory comsumption is pretty low, about 2-4%. CPU usage, however is around 60% on my laptop and 99,7 - 100% on the server.

Is CPU the most importing factor in this case? Is there any way to split it for example into several processes which in total would take less? Despite all, how to monitor CPU usage, which part of script is most consuming.

0 answers