Towards a new SymPy (opens in new tab)

(oscarbenjamin.github.io)

141 pointsasmeurer2y ago67 comments

67 comments

Part II made the front page yesterday: https://news.ycombinator.com/item?id=37426080

A comment there makes what I think is a very good point about "the lack of consolidation of computer algebra efforts": https://news.ycombinator.com/item?id=37430437

I don't know what might drive or foster such consolidation. Maybe Category Theory? Bridging syntax?

viscousviolin2y ago

How about Lean? [0] There's a whole library of mathematics written down in Lean called Mathlib, which spans most of the undergraduate maths curriculum upto some cutting edge research-level maths. I've commented under the Part II post you linked to as well, describing how I think Mathlib could help the CAS ecosystem.

[0] https://leanprover-community.github.io/

staunton2y ago

That's an entirely different thing though. Good luck getting Lean to help you do any symbolic computation whatever. You can use it to prove that a given manipulation is correct. You cannot use it to find a result (there may be a symbolic math library for Lean eventually but currently there isn't).

2 more replies

bmitc2y ago

Mathlib is not nearly as complete as advertised. It is very much a collection of research projects with little cohesion.

1 more reply

mathisfun1232y ago

The unfortunate thing is that core CAS functionality (and performance) should be based on a SAT/SMT engine to discover the rewrite rules and it goes without saying (but I'll say it anyway) implementing that is "hard". Naively (i.e., I haven't really thought through it), I think sympy should be built on top of z3 or cvc5 or at least very tighly integrated (e.g., support as a backend). And (again naively) I think until such time, sympy will remain a toy.

I will say though that symengine is a great project and congrats to that guy for pulling it off under the constraints of a phd.

StableAlkyne2y ago

Not an expert in SAT solvers, personally. What would the benefit be in using one?

Anaconda's absurdly slow dependency resolver (it takes up to 15 minutes for things Pip installs in seconds, with no way of disabling it outside of installing a third party solver) is based on SAT and left a bad taste in my mouth for them (slow, clunky, etc), but maybe it's just a poor implementation on their part.

mathisfun1232y ago

> Anaconda's absurdly slow dependency resolver

On the one hand this is like comparing apples and oranges. On the other hand SAT is slow for dependency management not because the solver itself is slow (it might well be - I don't know which solver they packaged) but because it's an NP-complete problem.

Back to sympy: a computer algebra system (CAS) is primarily (IMHO) an algebra system, not a matrix manipulation library or a pde solver or whatever kitchen sink collection of things is in all of them. Algebra in this context means manipulating algebraic expressions and that's term rewriting and that's also NP-complete (well at least in some form or fashion, eg egraph extraction).

So in summary - there's no way out of using SAT/SMT here.

> slow, clunky, etc

Just to put a finer point on this - it's a very shallow thing to look at conda's or whomever's implementation and then paint over SAT/SMT with that same brush. The way that I usually describe z3 is that it is nuclear weapons grade industrial software. I mean jesus christ its stated goal is solving NP complete/hard problems and it frequently succeeds at this goal on problems with millions of decision variables and clauses. It is absolutely the highest tech piece of tech out there (XYZ pytorch/tensorflow ai ml thing pales in comparison) and we are all extremely lucky that it is licensed permissively and developed completely in the open (and basically by one guy!). And it is being used in many many places for very serious engineering.

scoresmoke2y ago

Yes, this is a big disadvantage. But have you tried Mamba that aims at implementing Anaconda more efficiently? It works really well in most cases.

https://mamba.readthedocs.io/

2 more replies

bonzini2y ago

Fedora's dnf also uses SAT and it's way faster (with caching) than the handwritten resolver it replaced.

abdullahkhalids2y ago

Miniconda has a better dependency resolver, and it can resolve in less than a minute in most cases.

2 more replies

philzook2y ago

You might enjoy ruler https://github.com/uwplse/ruler

It would be very interesting for SMT and CAS to converge a bit more. SMT in expressiveness and domains and CAS in rigor.

The modality of their usage is different. CAS tends to return some expressions of interest, which it is hard to get SMT to do. Either you get "unsat" or a particular model from an SMT solver, not a simplified expression (ok, z3 has a simplify command, which is pretty cool).

SMT today is not obviously expressive enough to handle most of the domains and questions that come up in CAS systems.

Most SMT solvers do not intrinsically handle transcendental functions or any notions of calculus, abstract algebra, etc.

CAS systems are largely interested in problems of equational reasoning, whereas SMT's bread and butter is gluing together "trivialities" like linear inequalities and congruence closure with SAT search.

c-cube2y ago

There's a workshop exploring that: http://www.sc-square.org/CSA/welcome.html . They're trying to bridge cas and smt.

maple31422y ago

I am not sure current SAT solver is good at solving things that a good CAS can do. It is fast at a lot of operations related to bits (xor, shift, and, or ...) but it performs way worse for things like solving a linear system in a finite field. (This is all from my personal experiences, so I may be wrong.)

mathisfun1232y ago

> linear system in a finite field

I said it below, but I'll repeat it here: in my humble opinion, this is not what you want from a CAS. This is functionality better delegated to a BLAS (yes even with the finite field qualifier). And just because both CAS and BLAS have A in them, does not mean they are the same thing.

abecedarius2y ago

Is there an existing CAS built on top of a SAT or SMT solver?

mathisfun1232y ago

oss? none that i'm aware of that use a SAT/SMT solver for the term rewriting (like i'm suggesting). closed source, my strong intuition is both mathematica and magma work this way.

2 more replies

Q6T46nT668w6i3m2y ago

Toy? SymPy has room for improvement but it has made a tremendous impact in research and industry.

sheepshear2y ago

"Toy" is solver jargon that sort of means there's an alternative that blows it out of the water.

toth2y ago

Is Mathematica built like this?

mathisfun1232y ago

Yes but I doubt they're using z3 or cvc5 or any other oss sat/smt solver.

1 more reply

bmitc2y ago

> core CAS functionality (and performance) should be based on a SAT/SMT engine to discover the rewrite rules

Why is that? What are the alternatives?

7thaccount2y ago

I really liked the article and how it explained CAS vs Numerical solutions.

It also looks like SymPy or SymEngine is starting to catch up to Mathematica which also is pretty cool and does the same kind of expansion of an expression into a tree of sub expressions.

abdullahkhalids2y ago

Is there any comparison of their features anywhere? Last time I tried sympy a few years ago, it was quite a bit lacking compared to Mathematica.

7thaccount2y ago

I don't personally know, but assume it'll take many years to catch up with Mathematica which has symbolic computing as their bread and butter with a large amount of developers adding to that codebase since like the 80s.

eigenket2y ago

I love SymPy. Its so useful for doing calculations I don't want to do myself.

qubex2y ago

I’ve tried to appreciate SymPy but I always find myself running home to Mathematica. There’s simply no comparison. SymPy is like a match and Mathematica has the power of a sizeable thermonuclear warhead.

rowanG0772y ago

It's true. Unfortunately Mathematica simply can't be used in many domains. I would really like to integrate mathematica with a type checker for automatic theorem proving. I think it could greatly alleviate the clunkiness of dependent types.

nequo2y ago

Do you know of attempts to integrate SymPy in this way?

1 more reply

sheepshear2y ago

What's preventing it from being used?

1 more reply

bionhoward2y ago

If you have fewer primitives and terminals than there are UTF-8 characters (1.1 million), then you could ditch OOP expression trees altogether and use simple strings in Polish notation with a mapping of utf-8 characters to operations (simple lambdas). That way you don’t need __dict__ on every node of every tree. However, you’d have to rewrite the stuff which expects the OOP trees to instead expect Polish notation strings. This approach scales a lot further than classes because you reduce the memory cost of the algebraic expressions down to the simplest string to represent them (and even smaller if you pack the bits into an ANS, that’s a performance hit to reduce memory more)

pxeger12y ago

At that point it's hard to justify not just writing it in C directly

haberman2y ago

I remember being a kid and fawning over the upgrade from a TI-86 (which could not do symbolic manipulation) to the TI-89 (which could).

As an adult and OSS enthusiast, I've often wondered if there is an OSS option that can at least match, and ideally exceed, the TI-89's capabilities. Is SymPy it? I've had a few reasonably good experiences with SymPy, but I don't know much about the theoretical underpinnings of CAS, or how SymPy compares to competing offerings.

taeric2y ago

Depends what you mean? Mathematica is quite impressive.

Many symbolic options exist in lisps. https://stackoverflow.com/questions/10355112/why-is-lisp-so-... is a good answer that goes over some of the reason for that.

haberman2y ago

Mathematica is certainly as powerful as a TI-89, but not OSS.

1 more reply

HelloNurse2y ago

Regarding the issue of representing symbolic expression and controlling their evaluation or simplification, are there precedents of using e-graphs to memoize and reuse work rather than simple trees with destructive updates?

philzook2y ago

The herbie project using egraphs to explore different ways of rewriting floating point expressions. https://herbie.uwplse.org/ One can also write custom rulesets in egglog (a new egraph rewriting system / language / datalog) https://egraphs-good.github.io/egglog/?example=herbie

The approach is not yet anywhere near being able to touch all the domains sympy can handle. Destructive term rewriting tends to be a bit more forgiving to unsoundness in the rules and still returning roughly meaningful results. EGraph rewriting (and other automated reasoning systems) tend to just return junk as soon as you aren't careful about your semantics. Associativity and commutativity are ubiquitous in CAS applications and encoding these concepts in general purpose terms is rather unsatisfying. The post above emphasizes specialty methods for polynomials, which it would be desirable to find a clean way to integrate into egraph techniques. Variable binding (which is treated in a rather mangled form in CAS systems) is seemingly important for treating summation, differentiation, and integration correctly. The status of doing variable binding efficiently and correctly in egraphs is also unclear imo.

roger_2y ago

SymPy is pretty nice but every time I use it for a real problem I end up hitting a wall and have to dig through the source or look at old issue reports for a workaround.

Most recently I wanted to simplify a complex expression with terms like ‘diag(v1) * v2’ into Hadamard products, and found I’d need to implement custom rules to get it to work.

amelius2y ago

Python is great in DL, so any chance we'll see a combination of neural nets and computer algebra in a new sympy?

j2kun2y ago

If the author is reading this: please add an rss feed to the blog! I'd love to follow along for updates

kzrdude2y ago

I think banking on SymEngineX ("SEX") for the Sympy 2.0 release would be interesting branding.

I'm also cheering for Sympy, I think it's longevity now still predicts success in the future.

alanbernstein2y ago

Excited to hear about the new LaTeX+SEX stack

j / k navigate · click thread line to collapse

67 comments

carapace2y ago

Part II made the front page yesterday: https://news.ycombinator.com/item?id=37426080

A comment there makes what I think is a very good point about "the lack of consolidation of computer algebra efforts": https://news.ycombinator.com/item?id=37430437

I don't know what might drive or foster such consolidation. Maybe Category Theory? Bridging syntax?

viscousviolin2y ago

[0] https://leanprover-community.github.io/

staunton2y ago

2 more replies

bmitc2y ago

Mathlib is not nearly as complete as advertised. It is very much a collection of research projects with little cohesion.

1 more reply

mathisfun1232y ago

I will say though that symengine is a great project and congrats to that guy for pulling it off under the constraints of a phd.

StableAlkyne2y ago

Not an expert in SAT solvers, personally. What would the benefit be in using one?

mathisfun1232y ago

> Anaconda's absurdly slow dependency resolver

So in summary - there's no way out of using SAT/SMT here.

> slow, clunky, etc

scoresmoke2y ago

Yes, this is a big disadvantage. But have you tried Mamba that aims at implementing Anaconda more efficiently? It works really well in most cases.

https://mamba.readthedocs.io/

2 more replies

bonzini2y ago

Fedora's dnf also uses SAT and it's way faster (with caching) than the handwritten resolver it replaced.

abdullahkhalids2y ago

Miniconda has a better dependency resolver, and it can resolve in less than a minute in most cases.

2 more replies

philzook2y ago

You might enjoy ruler https://github.com/uwplse/ruler

It would be very interesting for SMT and CAS to converge a bit more. SMT in expressiveness and domains and CAS in rigor.

SMT today is not obviously expressive enough to handle most of the domains and questions that come up in CAS systems.

Most SMT solvers do not intrinsically handle transcendental functions or any notions of calculus, abstract algebra, etc.

c-cube2y ago

There's a workshop exploring that: http://www.sc-square.org/CSA/welcome.html . They're trying to bridge cas and smt.

maple31422y ago

mathisfun1232y ago

> linear system in a finite field

abecedarius2y ago

Is there an existing CAS built on top of a SAT or SMT solver?

mathisfun1232y ago

oss? none that i'm aware of that use a SAT/SMT solver for the term rewriting (like i'm suggesting). closed source, my strong intuition is both mathematica and magma work this way.

2 more replies

Q6T46nT668w6i3m2y ago

Toy? SymPy has room for improvement but it has made a tremendous impact in research and industry.

sheepshear2y ago

"Toy" is solver jargon that sort of means there's an alternative that blows it out of the water.

toth2y ago

Is Mathematica built like this?

mathisfun1232y ago

Yes but I doubt they're using z3 or cvc5 or any other oss sat/smt solver.

1 more reply

bmitc2y ago

> core CAS functionality (and performance) should be based on a SAT/SMT engine to discover the rewrite rules

Why is that? What are the alternatives?

7thaccount2y ago

I really liked the article and how it explained CAS vs Numerical solutions.

It also looks like SymPy or SymEngine is starting to catch up to Mathematica which also is pretty cool and does the same kind of expansion of an expression into a tree of sub expressions.

abdullahkhalids2y ago

Is there any comparison of their features anywhere? Last time I tried sympy a few years ago, it was quite a bit lacking compared to Mathematica.

7thaccount2y ago

eigenket2y ago

I love SymPy. Its so useful for doing calculations I don't want to do myself.

qubex2y ago

rowanG0772y ago

nequo2y ago

Do you know of attempts to integrate SymPy in this way?

1 more reply

sheepshear2y ago

What's preventing it from being used?

1 more reply

bionhoward2y ago

pxeger12y ago

At that point it's hard to justify not just writing it in C directly

haberman2y ago

I remember being a kid and fawning over the upgrade from a TI-86 (which could not do symbolic manipulation) to the TI-89 (which could).

taeric2y ago

Depends what you mean? Mathematica is quite impressive.

Many symbolic options exist in lisps. https://stackoverflow.com/questions/10355112/why-is-lisp-so-... is a good answer that goes over some of the reason for that.

haberman2y ago

Mathematica is certainly as powerful as a TI-89, but not OSS.

1 more reply

HelloNurse2y ago

philzook2y ago

roger_2y ago

SymPy is pretty nice but every time I use it for a real problem I end up hitting a wall and have to dig through the source or look at old issue reports for a workaround.

Most recently I wanted to simplify a complex expression with terms like ‘diag(v1) * v2’ into Hadamard products, and found I’d need to implement custom rules to get it to work.

amelius2y ago

Python is great in DL, so any chance we'll see a combination of neural nets and computer algebra in a new sympy?

j2kun2y ago

If the author is reading this: please add an rss feed to the blog! I'd love to follow along for updates

kzrdude2y ago

I think banking on SymEngineX ("SEX") for the Sympy 2.0 release would be interesting branding.

I'm also cheering for Sympy, I think it's longevity now still predicts success in the future.

alanbernstein2y ago

Excited to hear about the new LaTeX+SEX stack

j / k navigate · click thread line to collapse