3

I want to improve my haskell skills of writing really performant code (coming from a C/C++ Background this is important for my ego :D).

So I have written two functions to calculate Pi by the Leibnitz Formula (its not about calculation pi, it was just an example):

calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
calcPiN n = foldl (\p x -> p + calcPiStep x) 0.0 [0..n]
calcPiInf = toinf [0..] 0
    where
        toinf = \(x:xs) l -> do 
            let curr = l + calcPiStep x
            curr:toinf xs curr

calcPiInf constructs a infinite List by recursion. calcPiN with a foldl and a lambda with n iterations.

I have found that calcPiInf is faster than calcPiN AND does not run into a stack overflow for too large numbers. First question: is this just because of lazy evaluation?

Secondly I wrote a corresponding C++ Program:

using namespace std;
double calcPi(int n){
    double pi = 0;
    for(size_t i = 0; i < n; i++){
        pi += 4.0*(i%2 == 0?1:-1)/(2.0*i + 1.0);
    }
    return pi;
}
int main(){
    std::cout.precision(10);
    cout << calcPi(5000000) << endl;
}

Which is far faster than my Haskell Solution. Is it theoretically possible to rewrite my Haskell Code to achieve a similar performance as in C++?

4
  • 1
    If you're interested in a review, then Code Review might be more suitable for your question. Commented Feb 11, 2021 at 10:47
  • 2
    Benchmarking is hard. How did you actually time these? It's not at all obvious, because your two definitions don't even do the same thing: one of them is a list, and the other is a function from number to number. At least it is easy to say that laziness is one possible issue: foldl is never a good idea, and foldl' would be an improvement. Commented Feb 11, 2021 at 10:49
  • 1
    Hardware platform, compilers, compiler versions, optimization levels ? Also did you try with foldl' and foldr ? Commented Feb 11, 2021 at 10:49
  • 3
    General suggestions: 1) Provide specific type annotations e.g. ::Double, 2) never use foldl, use foldl' for computing numeric values, 3) avoid GHCi, compile with ghc and -O2. Commented Feb 11, 2021 at 10:59

1 Answer 1

6
  1. Use foldl' (from Data.List) instead of foldl (and prefer that variant compared to a lazily generated list)
  2. Use explicit type signatures, or you end up with Integer.
  3. Use optimizations (-O2)

The following code takes ~3.599s on my system (GHC 8.0.2, no optimizations)

calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
calcPiN n = foldl (\p x -> p + calcPiStep x) 0.0 [0..n]

main = print $ calcPiN 5000000

Using foldl' instead of foldl yields ~1.7s (only ~40% of the original time).

import Data.List
calcPiStep k = (if even k then 4.0 else -4.0) / (2.0*fromIntegral k + 1.0)
calcPiN n = foldl' (\p x -> p + calcPiStep x) 0.0 [0..n]

main = print $ calcPiN 5000000

Using type signatures yields ~0.8s, or another 50% reduction. If we now add optimizations, we end up with 0.066s, which is still around twice as slow as the C++ variant (0.033s on my machine with -O3, gcc), but it's almost there.

Note that we could also have used -O2 immediately to get below a single second, but any improvement before adding -O2 often (but not necessarily!) also leads to an improvement afterwards.

Here are all times depending on whether type signatures, foldl' or optimization flags were used. Note that type signatures together with -O2 already bring us to close to C++'s speed. However, that behaviour might not hold in general, and we need to change some functions depending on the lazyness:

Type annotations foldl' -O2 Runtime [s]
yes no yes 0.063
yes yes yes 0.063
no yes yes 0.180
no no yes 0.190
yes yes no 0.825
no yes no 1.700
yes no no 2.477
no no no 3.599
Sign up to request clarification or add additional context in comments.

7 Comments

Increasing the 5000000 figure here seems to make the gap more evident. GHC is running at 6x times on my machine using 500000000. I did not expect that.
@chi 6x times compared to the C++ variant? I still get around factor 2 between GHC's optimized code and g++ -O3, even with 500000000.
@chi Huh. Cannot reproduce that gap. I get ~2.272s (g++ -O3) vs ~4.972s (GHC -O2), so ~2.4x slower. However, my GCC and GHC aren't the most recent ones. I'll have a look into the profile.
@Zeta My oldish g++ -O3 7.5.0 seems to be much faster than yours. My GHC -O2 8.6.5 is more in line with your numbers. Neither of us is really using the latest compilers.
On my machine, using -fllvm speeds this up even more, about 6x as fast with -fllvm as without. (As a rule of thumb, heavy numeric computations often benefit from -fllvm.)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.