Performance Improvements in Haskell

Question

I want to improve my haskell skills of writing really performant code (coming from a C/C++ Background this is important for my ego :D).

So I have written two functions to calculate Pi by the Leibnitz Formula (its not about calculation pi, it was just an example):

calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
calcPiN n = foldl (\p x -> p + calcPiStep x) 0.0 [0..n]
calcPiInf = toinf [0..] 0
    where
        toinf = \(x:xs) l -> do 
            let curr = l + calcPiStep x
            curr:toinf xs curr

calcPiInf constructs a infinite List by recursion. calcPiN with a foldl and a lambda with n iterations.

I have found that calcPiInf is faster than calcPiN AND does not run into a stack overflow for too large numbers. First question: is this just because of lazy evaluation?

Secondly I wrote a corresponding C++ Program:

using namespace std;
double calcPi(int n){
    double pi = 0;
    for(size_t i = 0; i < n; i++){
        pi += 4.0*(i%2 == 0?1:-1)/(2.0*i + 1.0);
    }
    return pi;
}
int main(){
    std::cout.precision(10);
    cout << calcPi(5000000) << endl;
}

Which is far faster than my Haskell Solution. Is it theoretically possible to rewrite my Haskell Code to achieve a similar performance as in C++?

If you're interested in a review, then Code Review might be more suitable for your question. — Zeta
– Zeta, Commented Feb 11, 2021 at 10:47
Benchmarking is hard. How did you actually time these? It's not at all obvious, because your two definitions don't even do the same thing: one of them is a list, and the other is a function from number to number. At least it is easy to say that laziness is one possible issue: foldl is never a good idea, and foldl' would be an improvement. — amalloy
– amalloy, Commented Feb 11, 2021 at 10:49
Hardware platform, compilers, compiler versions, optimization levels ? Also did you try with foldl' and foldr ? — jpmarinier
– jpmarinier, Commented Feb 11, 2021 at 10:49
General suggestions: 1) Provide specific type annotations e.g. ::Double, 2) never use foldl, use foldl' for computing numeric values, 3) avoid GHCi, compile with ghc and -O2. — chi
– chi, Commented Feb 11, 2021 at 10:59

Zeta · Accepted Answer · 2021-02-11 11:13:13Z

6

Use foldl' (from Data.List) instead of foldl (and prefer that variant compared to a lazily generated list)
Use explicit type signatures, or you end up with Integer.
Use optimizations (-O2)

The following code takes ~3.599s on my system (GHC 8.0.2, no optimizations)

calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
calcPiN n = foldl (\p x -> p + calcPiStep x) 0.0 [0..n]

main = print $ calcPiN 5000000

Using foldl' instead of foldl yields ~1.7s (only ~40% of the original time).

import Data.List
calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
calcPiN n = foldl' (\p x -> p + calcPiStep x) 0.0 [0..n]

main = print $ calcPiN 5000000

Using type signatures yields ~0.8s, or another 50% reduction. If we now add optimizations, we end up with 0.066s, which is still around twice as slow as the C++ variant (0.033s on my machine with -O3, gcc), but it's almost there.

Note that we could also have used -O2 immediately to get below a single second, but any improvement before adding -O2 often (but not necessarily!) also leads to an improvement afterwards.

Here are all times depending on whether type signatures, foldl' or optimization flags were used. Note that type signatures together with -O2 already bring us to close to C++'s speed. However, that behaviour might not hold in general, and we need to change some functions depending on the lazyness:

Type annotations	`foldl'`	`-O2`	Runtime [s]
yes	no	yes	0.063
yes	yes	yes	0.063
no	yes	yes	0.180
no	no	yes	0.190
yes	yes	no	0.825
no	yes	no	1.700
yes	no	no	2.477
no	no	no	3.599

edited Feb 11, 2021 at 11:13

answered Feb 11, 2021 at 11:01

Zeta

106k13 gold badges206 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

chi Over a year ago

Increasing the 5000000 figure here seems to make the gap more evident. GHC is running at 6x times on my machine using 500000000. I did not expect that.

Zeta Over a year ago

@chi 6x times compared to the C++ variant? I still get around factor 2 between GHC's optimized code and g++ -O3, even with 500000000.

Zeta Over a year ago

@chi Huh. Cannot reproduce that gap. I get ~2.272s (g++ -O3) vs ~4.972s (GHC -O2), so ~2.4x slower. However, my GCC and GHC aren't the most recent ones. I'll have a look into the profile.

chi Over a year ago

@Zeta My oldish g++ -O3 7.5.0 seems to be much faster than yours. My GHC -O2 8.6.5 is more in line with your numbers. Neither of us is really using the latest compilers.

Daniel Wagner Over a year ago

On my machine, using -fllvm speeds this up even more, about 6x as fast with -fllvm as without. (As a rule of thumb, heavy numeric computations often benefit from -fllvm.)

|

Collectives™ on Stack Overflow

Performance Improvements in Haskell

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related