I have a script that searches through a log file using a regex filtering statement and puts the matching lines into another file. The regex is fairly straightforward something like this:
(en|es|fr|zh|ar|)/?(news|publications|about|key-issues|contact-us)
(with a few more matching keywords, etc).
I have a pretty good idea which of the groups of matching keywords gets most matches. Would it improve the performance of the script if I put the keywords most likely to match first in the list (for example, 'news' is most likely to get matched, followed by 'publications', etc)? Or does it not matter which is the order? When the script does the parsing, does it go through the line trying to match with the first element, then if no match the second, and so on, until it finds a match? Would there be a way to make the script more efficient, if we know about the likelihood of each keyword being a match?