Regex Puzzle (opens in new tab)

(bbc.co.uk)

388 pointsmboto8y ago90 comments

90 comments

Well, there is https://regexcrossword.com/

The OP converted into this format: https://regexcrossword.com/playerpuzzles/595e5542d2433

There's an error in the second "Across" expression. [^PZVJG]{4}(.)[EFUG]{6}(.)[^\sPZVJI]{2} should be [^PZVJG]{4}(.)[EFUG]{6}\1[^\sPZVJI]{2}

2 more replies

rpearl8y ago

Fixed the typo: https://regexcrossword.com/playerpuzzles/595e8c8e86584

kreetx8y ago

Nice!

bshimmin8y ago

Brilliant. My dad is 71, loves puzzles (like cryptic crosswords and Sudoku), is a huge technophobe, and has just retired. This should keep him busy until about 2022.

baron8168y ago

Technophobe or technophile?

bshimmin8y ago

Technophobe. Just acquiring the necessary Google skills to find out what the proper regex rules are will probably take him till Christmas. But hey, the man loves a challenge.

sergiosgc8y ago

Technophobe, probably. A technophile crossword aficionado should have this solved by dinner time.

canada_dry8y ago

Regex is one of those tools that I use a couple times a year - usually for cleaning up lousy input data.

I always end up spending a fair amount of time using tools like:

http://regex.inginf.units.it/

https://regex101.com/

http://www.regexr.com/

And of course stackoverflow.

TACIXAT8y ago

I use https://www.debuggex.com/ a lot when it's a complex expression. The visualization really helps.

squeaky-clean8y ago

I will always keep a Windows install at the ready just for RegexBuddy. I use it mostly to take a regex and generate the code I need for it (e.g. find the first numbered group in match in javascript), without having to remember language specific details.

rgb1228y ago

Why don't you just learn learn Linux and pcre? What use are 2-bit windows tools? Don't fill your head with windows - a dead os walking

2 more replies

Drdrdrq8y ago

Really? I use it fairly often, usually for cleaning input data and similar. But I only use a subset of regex functionalities that works across different engines and that doesn't make problems with escaping strings (no backslashes and similar).

seven8008y ago

You might also be interested in a tool I made which generates random strings that match a given regex:

http://regexicon.com/

Very useful when code reviewing other people's regular expressions.

simlevesque8y ago

I love the fact that you can have unit tests for your regexes on regex101.

vacri8y ago

As I get more experienced in operations, I find regex to be more and more invaluable. I used to dread doing a regex, now I get enjoyment out of sorting out a tricky one.

rosstex8y ago

That first link, oh my god! I can have some fun with this.

hokkos8y ago

I've worked on the project where some XSD files defined fields with regex restrictions, also some rules over fields added other stricter regexps or negative regexps depending on some context in a format called Schematron. I had to generate XML files conforming to those XSD, so I used some tools around Z3 solver and Microsoft.Automata to generate those strings conforming to multiple regexps. It would convert the regexps to finite automaton and intersecting them, walking it from the starting state to a final one over a charset.

Links :

https://www.microsoft.com/en-us/research/publication/symboli...

https://www.microsoft.com/en-us/download/details.aspx?id=523...

It now seems to be Open Source (MIT):

https://github.com/AutomataDotNet/Automata

eru8y ago

There's also redgrep (https://github.com/google/redgrep) that supports intersection and complements of regular expressions.

I am toying the idea of writing a little game where player A thinks of a regular expression, and player B tries to guess. If B guesses right, they win. If B guesses wrong, A has to provide a false positive and a false negative (if they exist), and B gets to guess again.

Can you think of ways to automate the roles of A and/or B?

long8y ago

In computer science academia, this kind of game is called grammar induction (of which inferring regular expressions is a special case).

A classic algorithm for inferring regular expressions was given by Angluin: https://people.eecs.berkeley.edu/~dawnsong/teaching/s10/pape...

(This isn't quite the same setup as you're thinking of but there are a ton of variations on the basic idea)

1 more reply

hokkos8y ago

I think AutomataDotNet can do all that :

Automation of regular expression generation, it seems easy : use RE fragments and aggregate them, or walk the type hierarchy of the RE AST and generate them randomly.

B needs to guess A's RE so we need to generate examples of strings belonging to it to gives hints : this is exactly the use case of AutomataDotNet.

Also if B guess a RE that is equivalent to A's RE it seems unfair to not attribute a win, so we need to tell if 2 RE belong to the same equivalence class. AutomataDotNet does have a AreEquivalent method.

You can automate the generation of false positive and a false negative with the method Minus to creates an automaton that accepts A-B or B-A and generate an example.

1 more reply

jfries8y ago

How did you solve backreferences with that approach?

eru8y ago

You don't. Back reference leave the space of mathematical regular expressions, that all this nice theory works for.

hokkos8y ago

I had the luck that XSD regex doesn't support backreference.

jgrahamc8y ago

Worth doing this by hand to exercise your knowledge of regular expressions. My solution (SPOILER): http://imgur.com/a/9iK9J

simias8y ago

Well done. I don't know what you think but I found that most of the time the character classes would intersect perfectly (i.e. there'd only be one character possible once you intersect both sides of a single square). That made it pretty easy overall since for the vast majority of the board you don't have to worry about the "context".

But I guess if it's meant for an audience of folks not very familiar with regexes it's difficult enough as it is.

jgrahamc8y ago

I thought it was pretty easy given that the character classes meant that it was pretty easy to take a row/column and eliminate possibilities.

stedaniels8y ago

Well, thanks. I tried to cheat with https://github.com/blukat29/regex-crossword-solver and got hit with lex parsing errors! My limited python and 5 minute effort resulted in failure! At least I got to read the message though :-)

rootlocus8y ago

I'm assuming the solution isn't unique because I found some positions that are under-constrained.

vhold8y ago

I only found one column to have multiple solutions, and saved it for last, at which point only one option made sense.

1 more reply

angry_albatross8y ago

Which positions? I didn't find any that were under-constrained.

1 more reply

vacri8y ago

Ah, but you have underscores where there should be spaces! ;)

KineticLensman8y ago

This BBC report refers to a puzzle released by the UK's National Cyber Security Centre [1], as part of an online recruitment effort.

[1] https://www.ncsc.gov.uk/news/take-our-regex-crossword-challe...

cag_ii8y ago

Interstingly, Bletchly Park is known to have used crossword puzzles published in The Daily Telegraph as a recruitment tool for "codebreakers" during WWII.

https://en.wikipedia.org/wiki/Bletchley_Park#Personnel

arien8y ago

So I suppose is it a one time thing only? A shame, it was quite fun to solve!

simlevesque8y ago

There you go: https://regexcrossword.com/

Cephlin8y ago

Wow, finally a crossword I have a chance at!

dbrgn8y ago

If you want a challenge, try this one: http://twiki.org/p/pub/Codev/TWikiPresentation2013x03x07/reg...

dmit8y ago

Originally from the MIT Mystery Hunt:

https://devjoe.appspot.com/huntindex/puzzle/mit2013601

PDF: http://web.mit.edu/puzzle/www/2013/coinheist.com/rubik/a_reg...

rspeer8y ago

Note that as a Mystery Hunt puzzle, the goal of the puzzle isn't just to fill in the grid, it's to find the answer, a secret word or phrase that would be filled into another puzzle (the metapuzzle).

The puzzles generally don't tell you how to extract the answer, but the idea is you know it when you see it.

gregable8y ago

I also created an HTML-based version of this one some time ago that allows rotation and color codes the rows as matching or not: https://gregable.com/p/regexp-puzzle.html

proactivesvcs8y ago

Thanks for the gregex in HTML format! (gets coat)

andyjohnson08y ago

I know that there are problems to do with regex matching that are NP-hard. So I'm wondering if it is possible to attack this puzzle using an algorithm that simplifies the individual regexes using knowledge of the regexes that that they interact with?

eutectic8y ago

This problem is NP-hard by reduction from SAT. Treat each column as a truth variable and use the rows to encode CNF clauses. For example, `(A | ^C)` becomes `(1..)|(..0)`. Then set all the column regexes to `( 0* )|( 1* )` to enforce a consistent truth value for each variable.

jonahx8y ago

Could you elaborate on the encoding? What are valid mappings?

1 more reply

eutectic8y ago

You could expand each regex to a regex on the whole table, and then take the intersection of the corresponding NFA/DFAs. Unfortunately, I suspect this takes exponential (or worse?) time in the worst case.

1 more reply

chpatrick8y ago

You can solve it using a SAT solver like Z3. There are particularly elegant solutions in Haskell that basically interpret the regex but on "symbolic" characters rather than real ones. You can then ask what values these characters can take such that all regexes match.

Some implementations: https://github.com/ekmett/ersatz/tree/master/examples/regexp... https://gist.github.com/LeventErkok/4942496

IanCal8y ago

It certainly should be. There limited things in the puzzle help as well.

For example, you can split each one of these regexes up into smaller ones based on positioning. Now some of them are simply "match any of the following characters" which can be combined together with set intersections (and something similar with "none of these characters).

mcbobbington8y ago

I love regexes. In addition to doing cool things and saving time, I feel like I'm a "real programmer" whenever I write a good one.

gargarplex8y ago

This comic artistically renders that feeling. I, too, know it well.

https://xkcd.com/208/

Already__Taken8y ago

Anyone know a decent android app for these? the MIT one has the most insane and broken scrolling functionality it's shocking.

Xophmeister8y ago

That wasn't as hard as I thought it would be. I was worried that, without stard/end of string anchors, things could get quite hairy, but the biggest stretch of logic was just, "There are five spaces for me to fit a character, an optional characters and two two-character sequences. Therefore that optional character must not appear."

Emyr428y ago

Column H pattern starts [MVFU]{2}, and 3 of those options don't match the Row 0 pattern, leaving "U"

The published solution says H0 should be "S".

Emyr428y ago

The version at https://www.ncsc.gov.uk/content/files/regex_cross_hard_v3.pn... has S[MVU]... for column H.

Guess nobody tested it.

shabble8y ago

Does any common regex format/dialect require '\-' for a literal hyphen? AFAIK it's only special inside character classes, and escaping it doesn't necessarily work there if it would form a valid range identifier.

okdana8y ago

I don't know of any that require it. But it's common to see punctuation characters escaped like that because Perl (and PCRE and its various cousins/descendants) allows you to escape any non-meta-character and have it treated as a literal.

I suppose the two main benefits are

(a) neither the writer nor the reader has to remember which punctuation characters are meta-characters (you just have to remember that it's always a literal if it's escaped), and

(b) in implementations like PHP's which try to replicate the Perl-style 'delimited' syntax (e.g., /foo/), it prevents characters in the pattern from conflicting with the delimiters.

Maybe there's some other advantage but i can't think of what.

jwilk8y ago

Direct link to the crossword:

https://ichef.bbci.co.uk/images/ic/976xn/p057t19t.jpg

ape48y ago

Since the clues are machine parsable it should be machine solvable.

hermanschaaf8y ago

It is indeed machine-solvable; I wrote a solver for regexcrossword.com puzzles a while back (https://github.com/hermanschaaf/regex-crossword-solver). It was great fun, maybe even more than solving the puzzles by hand!

mtharrison8y ago

Will your tool work on this puzzle though? I don't think so because it has backreferences.

gumby8y ago

Nice! At Keplers in Mountain View you can buy version of Scrabble that uses regexes. The designer used to sell it in front of the shop -- he is obviously a programmer.

tzakrajs8y ago

Could I stop by this afternoon and expect it to be in stock or was this a temporary offering?

gumby8y ago

It wasn't a short-term item, but poor Kepler's has shrunk so much who knows if it's in stock or not. I would call them. It wasn't described as using regexes of course, so you'll have to say something like that special version of scrabble.

The designer is local so if they no longer stock it you could look online...but it's better to get it from the shop if you can.

eutectic8y ago

Can you please expand a bit on how it worked?

gumby8y ago

Never played it, but from looking at the box of talking to the inventor it was just that The set of letters included regex operators and those The set of letters included regex operators and those you could put on the board as well. Meaning other people could use him as well To make words.

timdierks8y ago

I believe column E is under-constrained; a solution with column E = "YYYY " or "OOOO " passes the tests, but is clearly not what's intended.

timdierks8y ago

Never mind, this was an error in the https://regexcrossword.com/playerpuzzles/595e5542d2433 version, which has a (.) where it should have a \1 in row 2 (thanks to @angry_albatross).

angry_albatross8y ago

In the expression in the second row, the 5th character must match the 12th character, which must be R.

IanCal8y ago

Fun! I made a few mistakes by writing letters sideways which was then confusing (C vs U, for example), but this was a nice puzzle.

gozur888y ago

That's a very odd thing to see in a mainstream publication.

j / k navigate · click thread line to collapse

90 comments

bluesmoon8y ago

Well, there is https://regexcrossword.com/

samjs8y ago

The OP converted into this format: https://regexcrossword.com/playerpuzzles/595e5542d2433

angry_albatross8y ago

There's an error in the second "Across" expression. [^PZVJG]{4}(.)[EFUG]{6}(.)[^\sPZVJI]{2} should be [^PZVJG]{4}(.)[EFUG]{6}\1[^\sPZVJI]{2}

2 more replies

rpearl8y ago

Fixed the typo: https://regexcrossword.com/playerpuzzles/595e8c8e86584

kreetx8y ago

Nice!

bshimmin8y ago

Brilliant. My dad is 71, loves puzzles (like cryptic crosswords and Sudoku), is a huge technophobe, and has just retired. This should keep him busy until about 2022.

baron8168y ago

Technophobe or technophile?

bshimmin8y ago

Technophobe. Just acquiring the necessary Google skills to find out what the proper regex rules are will probably take him till Christmas. But hey, the man loves a challenge.

sergiosgc8y ago

Technophobe, probably. A technophile crossword aficionado should have this solved by dinner time.

canada_dry8y ago

Regex is one of those tools that I use a couple times a year - usually for cleaning up lousy input data.

I always end up spending a fair amount of time using tools like:

http://regex.inginf.units.it/

https://regex101.com/

http://www.regexr.com/

And of course stackoverflow.

TACIXAT8y ago

I use https://www.debuggex.com/ a lot when it's a complex expression. The visualization really helps.

squeaky-clean8y ago

rgb1228y ago

Why don't you just learn learn Linux and pcre? What use are 2-bit windows tools? Don't fill your head with windows - a dead os walking

2 more replies

Drdrdrq8y ago

seven8008y ago

You might also be interested in a tool I made which generates random strings that match a given regex:

http://regexicon.com/

Very useful when code reviewing other people's regular expressions.

simlevesque8y ago

I love the fact that you can have unit tests for your regexes on regex101.

vacri8y ago

As I get more experienced in operations, I find regex to be more and more invaluable. I used to dread doing a regex, now I get enjoyment out of sorting out a tricky one.

rosstex8y ago

That first link, oh my god! I can have some fun with this.

hokkos8y ago

Links :

https://www.microsoft.com/en-us/research/publication/symboli...

https://www.microsoft.com/en-us/download/details.aspx?id=523...

It now seems to be Open Source (MIT):

https://github.com/AutomataDotNet/Automata

eru8y ago

There's also redgrep (https://github.com/google/redgrep) that supports intersection and complements of regular expressions.

Can you think of ways to automate the roles of A and/or B?

long8y ago

In computer science academia, this kind of game is called grammar induction (of which inferring regular expressions is a special case).

A classic algorithm for inferring regular expressions was given by Angluin: https://people.eecs.berkeley.edu/~dawnsong/teaching/s10/pape...

(This isn't quite the same setup as you're thinking of but there are a ton of variations on the basic idea)

1 more reply

hokkos8y ago

I think AutomataDotNet can do all that :

Automation of regular expression generation, it seems easy : use RE fragments and aggregate them, or walk the type hierarchy of the RE AST and generate them randomly.

B needs to guess A's RE so we need to generate examples of strings belonging to it to gives hints : this is exactly the use case of AutomataDotNet.

You can automate the generation of false positive and a false negative with the method Minus to creates an automaton that accepts A-B or B-A and generate an example.

1 more reply

jfries8y ago

How did you solve backreferences with that approach?

eru8y ago

You don't. Back reference leave the space of mathematical regular expressions, that all this nice theory works for.

hokkos8y ago

I had the luck that XSD regex doesn't support backreference.

jgrahamc8y ago

Worth doing this by hand to exercise your knowledge of regular expressions. My solution (SPOILER): http://imgur.com/a/9iK9J

simias8y ago

But I guess if it's meant for an audience of folks not very familiar with regexes it's difficult enough as it is.

jgrahamc8y ago

I thought it was pretty easy given that the character classes meant that it was pretty easy to take a row/column and eliminate possibilities.

stedaniels8y ago

rootlocus8y ago

I'm assuming the solution isn't unique because I found some positions that are under-constrained.

vhold8y ago

I only found one column to have multiple solutions, and saved it for last, at which point only one option made sense.

1 more reply

angry_albatross8y ago

Which positions? I didn't find any that were under-constrained.

1 more reply

vacri8y ago

Ah, but you have underscores where there should be spaces! ;)

KineticLensman8y ago

This BBC report refers to a puzzle released by the UK's National Cyber Security Centre [1], as part of an online recruitment effort.

[1] https://www.ncsc.gov.uk/news/take-our-regex-crossword-challe...

cag_ii8y ago

Interstingly, Bletchly Park is known to have used crossword puzzles published in The Daily Telegraph as a recruitment tool for "codebreakers" during WWII.

https://en.wikipedia.org/wiki/Bletchley_Park#Personnel

arien8y ago

So I suppose is it a one time thing only? A shame, it was quite fun to solve!

simlevesque8y ago

There you go: https://regexcrossword.com/

Cephlin8y ago

Wow, finally a crossword I have a chance at!

dbrgn8y ago

If you want a challenge, try this one: http://twiki.org/p/pub/Codev/TWikiPresentation2013x03x07/reg...

dmit8y ago

Originally from the MIT Mystery Hunt:

https://devjoe.appspot.com/huntindex/puzzle/mit2013601

PDF: http://web.mit.edu/puzzle/www/2013/coinheist.com/rubik/a_reg...

rspeer8y ago

Note that as a Mystery Hunt puzzle, the goal of the puzzle isn't just to fill in the grid, it's to find the answer, a secret word or phrase that would be filled into another puzzle (the metapuzzle).

The puzzles generally don't tell you how to extract the answer, but the idea is you know it when you see it.

gregable8y ago

I also created an HTML-based version of this one some time ago that allows rotation and color codes the rows as matching or not: https://gregable.com/p/regexp-puzzle.html

proactivesvcs8y ago

Thanks for the gregex in HTML format! (gets coat)

andyjohnson08y ago

eutectic8y ago

jonahx8y ago

Could you elaborate on the encoding? What are valid mappings?

1 more reply

eutectic8y ago

1 more reply

chpatrick8y ago

Some implementations: https://github.com/ekmett/ersatz/tree/master/examples/regexp... https://gist.github.com/LeventErkok/4942496

IanCal8y ago

It certainly should be. There limited things in the puzzle help as well.

mcbobbington8y ago

I love regexes. In addition to doing cool things and saving time, I feel like I'm a "real programmer" whenever I write a good one.

gargarplex8y ago

This comic artistically renders that feeling. I, too, know it well.

https://xkcd.com/208/

Already__Taken8y ago

Anyone know a decent android app for these? the MIT one has the most insane and broken scrolling functionality it's shocking.

Xophmeister8y ago

Emyr428y ago

Column H pattern starts [MVFU]{2}, and 3 of those options don't match the Row 0 pattern, leaving "U"

The published solution says H0 should be "S".

Emyr428y ago

The version at https://www.ncsc.gov.uk/content/files/regex_cross_hard_v3.pn... has S[MVU]... for column H.

Guess nobody tested it.

shabble8y ago

okdana8y ago

I suppose the two main benefits are

(a) neither the writer nor the reader has to remember which punctuation characters are meta-characters (you just have to remember that it's always a literal if it's escaped), and

(b) in implementations like PHP's which try to replicate the Perl-style 'delimited' syntax (e.g., /foo/), it prevents characters in the pattern from conflicting with the delimiters.

Maybe there's some other advantage but i can't think of what.

jwilk8y ago

Direct link to the crossword:

https://ichef.bbci.co.uk/images/ic/976xn/p057t19t.jpg

ape48y ago

Since the clues are machine parsable it should be machine solvable.

hermanschaaf8y ago

mtharrison8y ago

Will your tool work on this puzzle though? I don't think so because it has backreferences.

gumby8y ago

Nice! At Keplers in Mountain View you can buy version of Scrabble that uses regexes. The designer used to sell it in front of the shop -- he is obviously a programmer.

tzakrajs8y ago

Could I stop by this afternoon and expect it to be in stock or was this a temporary offering?

gumby8y ago

The designer is local so if they no longer stock it you could look online...but it's better to get it from the shop if you can.

eutectic8y ago

Can you please expand a bit on how it worked?

gumby8y ago

timdierks8y ago

I believe column E is under-constrained; a solution with column E = "YYYY " or "OOOO " passes the tests, but is clearly not what's intended.

timdierks8y ago

Never mind, this was an error in the https://regexcrossword.com/playerpuzzles/595e5542d2433 version, which has a (.) where it should have a \1 in row 2 (thanks to @angry_albatross).

angry_albatross8y ago

In the expression in the second row, the 5th character must match the 12th character, which must be R.

IanCal8y ago

Fun! I made a few mistakes by writing letters sideways which was then confusing (C vs U, for example), but this was a nice puzzle.

gozur888y ago

That's a very odd thing to see in a mainstream publication.

j / k navigate · click thread line to collapse