• sugar_in_your_tea@sh.itjust.works
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    5 months ago

    Exactly. We have hundreds of thousands of lines of code that work reasonably well. I think we made the important decisions correctly, so performance issues in one area rarely impact others.

    We rewrote ~1k lines of poorly running Fortran code into well-written Python code, and that worked because we got the important parts right (reduced big-O CPU from O(n3) to O(n2 log n) and memory from O(n4) to O(n3)). Runtime went from minutes to seconds in medium size data sets, and made large data sets possible to run (those would OOM due to O(n4) storage in RAM). If you get the important parts right, Python is probably good enough, and you can get linear optimizations from there by moving parts to a compiled language (or use a JIT like numba). Python wasn’t why we could make it fast, it’s just what we prototyped with so we could focus on the architecture, and we stopped optimizing when it was fast enough.