For my current attempts, I bit off more than I could chew:
I tried to build a system that not only recognizes regular languages, but also serves as a parser for them (a la Parsec).
The latter approach pushes you to support something like fmap, but the whole derivatives-based approach needs more 'introspection' so support general mapping via fmap (ie a->b) is out, and you can only support things that you have more control over than functions.
(And in general, I am doing bifunctors, because I want the complement of the complement be the original thing.)
Sorry, if that's a bit confused.. If I was a better theoretician, I could probably work it out.
I haven't touched the code in a while. But recently I have thought about the theory some more. The Brzozowski derivative introduced the concept of multiplicative inverse of a string. I am working out the ramifications of extending that to the multiplicative inverse of arbitrary regular expressions. (The results might already be in the literature. I haven't looked much.)
I don't expect anything groundbreaking to come out of that, but I hope my understanding will improve.
> And these particular features just don't really pull their weight IMO. They are performance footguns, and IMO, are also tricky to reason about inside of regex syntax.
Well, in theory I could 'just' write a preprocessor that takes my regex with intersection and complement and translates it to a more traditional one. I wouldn't care too much if that's not very efficient.
I'm interested in those features because of the beauty of the theory, but it would also help make production regular expressions more modular.
Eg if you have a regular expression to decide on what's a valid username for someone to sign up to your system. You decide to use email addresses as your usernames, so the main qualification is that users can receive an email on it. But because they will be visible to other users, you have some additional requirements:
'.{0,100} & [^@]@[^@] & not (.(root|admin|<some offensive term>).@.) & not (.<sql injection>.*)'
That's a silly example. I think in production, I would be more likely to see something as complicated as this in eg some ad-hoc log parsing.
> The issue is that building a production grade regex engine---even when it's restricted to regular languages---requires a lot more engineering than theory.
Amen to that!