AutoRegex (opens in new tab)

(autoregex.xyz)

247 pointsfbuilesv3y ago109 comments

109 comments

Haven't tried it since I'd rather not sign up for an account. But regarding this example:

   Minimum eight characters, at least one letter, one number and one special character
    >> ^(?=.*[A-Za-z])(?=.*d)(?=.*[@$!%*#?&])[A-Za-zd@$!%*#?&]{8,}$

I assume the "d" characters here are supposed to be "\d" but the backslashes wandered off somewhere. Not sure if that's just in the example or the actual output from the AI. Some of those special characters like "*" and "$" need a backslash too.

Also, this pattern will not allow any character which is not a letter, number, or in that very finite list of special characters - so no spaces, no carets, no quote marks or apostrophes, etc. Not allowing certain printable characters in a password is a bad practice (I won't complain too much if you forbid the non-printable ones).

asicsp3y ago

    All words starting with the letter "n" and ending with "g", case insensitive
    >> /^nw*g$/i

Yeah, seems like backslash got gobbled up in some sort of parsing before what gets displayed on the site. Should be `\w` here.

Regarding your other point, characters like `*` won't need escaping within character class. Not sure why only those particular set of characters are deemed "special characters" - why not `-`, `=`, parentheses, curly braces, etc?

Also, the site should specify which regex flavor is being generated, I'd guess JS.

All said, it is good to see tools to help with regex, as long as the solutions will be tested to make sure it fits their particular problem.

Cyberdog3y ago

> Regarding your other point, characters like `*` won't need escaping within character class.

Really? Huh. I've been escaping them anyway, and I think I'll continue to do so just to avoid ambiguity when reading patterns.

I was about to end this post with "an extra backslash never hurt anybody" but then I realized that this is HN and writing that would surely prompt someone to reply with some bizarre case where an extra backslash caused a cascade of failures which cost a company fifteen million dollars and/or killed 43 people.

4 more replies

mhitza3y ago

That regex doesn't do what the description above it says, with or without a backslash.

1 more reply

thrdbndndn3y ago

No, you don't need to escape these special characters if they're in [].

For JS (which is used here) or Python at least. Some other implementations may vary.

colordrops3y ago

I don't trust regular expressions that I wrote, let alone some doped up parameter sniffing AI.

memorable3y ago

You can always use tools like Regex101[0] to verify if they actually work or not. I have tried a few generated by the AI, and it seems to do the job most of the time.

[0]: https://regex101.com/

croes3y ago

You still could have edge cases you don't want or want

2 more replies

mysterydip3y ago

I think the worst case here is it writing a regex that mostly works but fails for some edge cases that you don't think to test but will encounter in production.

treis3y ago

It'd be cool if it split out a bunch of test cases/examples so you can see what's happening in edge cases.

celticninja3y ago

That's always an issue with regard regardless of who wrote it

1 more reply

wpietri3y ago

Totally! And this is one of the worst kinds of code to generate with AI given how often regexes are write-only code. Personally, for any important regex I'm either going to have good unit tests, an extended-mode regex with comments, or both. Which I'm sure this AI is not going to do.

So to me this mainly looks like a way for people who don't understand something to put that ignorance into the codebase, setting traps for colleagues down the road. That's not a new experience for me, but this does seem likely to make that easier and more fun, two things I don't think dangerous code needs.

funstuff0073y ago

fair enough, but few things are easier to test than a regex.

russfink3y ago

Few things are easier to miss than a string that breaks your (nontrivial) regex.

I’d like to see a mathematical estimate of the number of test strings I should generate given some input regex.

1 more reply

unsignedint3y ago

Not an apple to apple comparison, but if you are trying to regex with help of software, my daily driver of building regex is RegexBuddy [1]. Not only makes building and testing regular expressions easy, but this pretty much covers all the Regex variants in the wild. (And comes with an excellent help file.)

The same author also make variant called RegexMagic [2] which probably would have closer premise with AutoRegex (less NN part, perhaps) as it is designed to make Regex without too much knowledge of regex, but I don't know how well it works as I haven't used it much...

[1]: https://www.regexbuddy.com/

[2]: https://www.regexmagic.com/

walls3y ago

regex101 does almost all of this but in the browser!

gd3kr3y ago

Hey! I’m the creator of www.autoregex.xyz (@gd3kr on twitter) I originally built it as a small side project in a couple days, I was absolutely not expecting a response this massive. I realise concerns about email and password sign up, and I built it in an hour-ish with firebase auth as a temporary solution to capping the sudden surge in GPT3 requests to the server after the twitter post gained traction. Im working on a better approach to that involving not having to create an account. Otherwise, all suggestions are absolutely welcome. Please tell me how I can make this a better experience for everyone.

knolan3y ago

You could probably include a ‘Why do I need to sign up?’ disclaimer.

mkl3y ago

Two of your three examples are incorrect as they're missing backslashes. /^nw*g$/i should be /^n\w*g$/i and (?=.*d) should be (?=.*\d).

chowells3y ago

So I was talking to a friend about this, and he thought this was a parody because of all the obviously incorrect results you were highlighting in the Twitter thread. Other people have mentioned the \ escapes going missing, but my friend called out https://twitter.com/gd3kr/status/1545495732265766913 as so hilariously wrong that it couldn't have been anything other than a parody.

Is this really a parody, or is it just another example of people not actually reading GPT-3 output carefully enough to notice it's nonsense?

gd3kr3y ago

Hey, no it’s not a parody lmao. I shipped it because realised it could have at least some utility while building it. To be fair, the wrong examples would be an oversight on my part, but I thought of them being more of genuine benchmark of what GPT3 was capable of; sort of like an experimental feature. I’m working hard on refining the results by fine tuning DaVinci and getting the output as close to ideal as possible — I think there’s a lot of potential there. In the end, even boosting a user’s productivity marginally is a win in my book, and as a lot of people have pointed out, it’s already doing that to some extent.

kelnos3y ago

Was curious to try it, but got turned off when asked to create an account, and didn't bother.

GeneralPie3y ago

How do I delete my account?

polskibus3y ago

I’d bet users would’ve preferred to provide examples of input and output to get the regexp they want , instead of designing it in plain English.

texaslonghorn53y ago

That seems like a computationally challenging problem. To avoid .+ you would have to include non matching examples and then I don't know how similar to the matches / specific those would have to be.

asicsp3y ago

Something like https://regex-generator.olafneumann.org/ ?

asah3y ago

Plain English has the benefit that you can use the microphone/dictation on your cellphone...

wtetzner3y ago

Why would you need to generate a regex on your cell phone?

wruza3y ago

Btw, does anyone know a library/program which can reverse engineer a regex from multiple source strings? E.g.

  14:51 [info] 51 some message
  … more of 51 lines …
  15:22 [error] 24 error!
  … more of 24 lines …

  ^(\d\d:\d\d) \[(info|error)\] (\d+) (.+)$

Or maybe not a regex, but a structured pattern.

manx3y ago

One probably wants to provide a set of matching and a set of non-matching strings. Then the software would output a regex and some edge-case matching strings and non-matching strings.

This could be built using set operations on deterministic finite automata (dfa). Every regex is equivalent to a dfa. You can now construct automata for every positive and negative example input. Then calculate the union for all positive examples and the union for all negative examples. And finally calculate the difference between the two unions. Convert the resulting automaton back to regex.

https://scanftree.com/automata/dfa-union-property

wruza3y ago

I was thinking of something that could categorize parts of these strings into a “language”, so there is no non-matching strings. It’s hard to specify in a formal way, but by looking at these strings you may see that e.g. […] is a static syntactic element, and a number follows it, and time precedes it. This would be nice to have to browse logs (which these strings are obviously a part of) but instead of scrolling through thousands of rows, see all of the patterns that occur among them at once, and then dig down into a pattern to inspect what happened and when to improve on “health” of a conpkex system. Of course if you know all of them in advance, it’s easy to filter by each. But lots of software/apis do not document their output in such detail.

ailef3y ago

Technically .* is a valid regex for those strings, so the issue here is not only to reverse engineer them, but to do so in a way that's meaningful for the person who has to use it after.

It shouldn't be hard to start with .* and resursively split it in two parts that still match the input strings, but I believe you will end up with matching but useless regexes.

Banana6993y ago

This is a special case of the general problem of program synthesis[1][2][3][4], where the search space of possible programs are all regex strings and the seed driving the synthesis is Input-Output examples.

There's research [5][6] as well as practical tools [7][8][9].

[1] https://en.wikipedia.org/wiki/Program_synthesis

[2] https://www.microsoft.com/en-us/research/project/program-syn...

[3] https://dl.acm.org/doi/10.1145/1836089.1836091

[4] https://royalsocietypublishing.org/doi/10.1098/rsta.2015.040...

[5] https://cs.stanford.edu/~minalee/pdf/gpce2016-alpharegex.pdf

[6] https://www.researchgate.net/publication/261794574_Automatic...

[7] https://regex-generator.olafneumann.org/

[8] http://regex.inginf.units.it/extract/

[9] https://stackoverflow.com/questions/6219790/need-a-regex-too...

amake3y ago

The closest thing I know of to this is https://github.com/devongovett/regexgen (or my Ruby port https://github.com/amake/regexgen-ruby).

    % bundle exec bin/regexgen '14:51 [info] 51 some message' '15:22 [error] 24 error!'
    (?-mix:1(?:4:51\ \[info\]\ 51\ some\ message|5:22\ \[error\]\ 24\ error!))

With enough inputs it should end up with something somewhat reasonable for the leading part, but it will never be smart enough to understand that the error message is "arbitrary" and should be matched with e.g. `(.+)`.

eurasiantiger3y ago

JS has String.prototype.replaceAll, which can take a regex with multiple capture groups and output them as separate params to a callback function. This can be used to create a functional DSL which generates the regexes and callbacks.

junon3y ago

I know this comment isn't helpful on its own, but yes this exists. I've seen it before. I just have no idea what it was called or how to find it again.

EDIT: Ah no, sorry. Was thinking of the other way around[0].

0: https://www.npmjs.com/package/regex-to-strings

nerdponx3y ago

RegexBuddy has some limited ability to do this, and the author of that program has a whole separate program called RegexMagic that I believe specializes in exactly this.

tomerv3y ago

http://regex.inginf.units.it/extract/

georgia_peach3y ago

It is surprising / not surprising to see the lengths people go to not learn regex. There's not much to it. If you're just starting out, find a good reference, and memorize it. Mastery is another thing entirely, but it always is.

In *nix land, a decent reference is built-in: `man awk` and jump to the "Regular Expressions" section.

jwilk3y ago

There are many awk implementations, so whether that reference is decent or not may vary. It looks like on BSDs, there's so such section at all.

georgia_peach3y ago

Tested on Linux, OpenBSD, & Mac:

POSIX: man 7 re_format

PCRE: man pcrepattern

PCRE2: man pcre2pattern

PCRE may be a "works on my machine" thing, but POSIX should be there.

gnulinux3y ago

    All words starting with the letter "n" and ending with "g", case insensitive
    >> /^nw*g$/i

This is not right. That should be a "\w" instead of "w".

curiousgal3y ago

Same with the last one, it should be \d instead of d. Seems like an escape character issue.

emilfihlman3y ago

It's otherwise wrong, too. It shouldn't have ^ or $, and \w matches non word characters. It should be /n[a-z]*g/gi

memorable3y ago

I have played with this for a while, and here are some prompts that might be interesting.

Prompt: "URL regex"

  ^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$

Prompt: "Email regex"

  /^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))

This one's weirdly sophisticated for some reason.

Prompt: "An HTML tag with a close tag and attributes"

  <([a-zA-Z]+)[^>]*?(?<!\/)>.*?<\/\1>

valvar3y ago

Email addresses are notoriously hard to parse.[0]

I'd wager that one probably misses a bunch of corner cases.

[0] https://www.regular-expressions.info/email.html

tragomaskhalos3y ago

This is all true, but suspect that the issue is mostly self-correcting since anything other than an extremely vanilla email address is likely to run into an infuriatingly high number of rejections precisely because the software at the other end has taken an overly simplistic view of what is allowed, forcing the owner to change it out of sheer frustration.

mdaniel3y ago

> Prompt: "An HTML tag with a close tag and attributes"

Obviously trained on a data set that didn't include <h1>

This whole thread is why I'm not scared of Copilot or similar taking away jobs anytime soon, since bug fixing is way harder than writing code

kseistrup3y ago

Also regex-related: Pomsky: https://pomsky-lang.org/ (formerly Rulex)

nerdponx3y ago

I've seen a handful of libraries and DSLs that are intended to replace regex now. It might be interesting to compile a list of them and attempt to compare them.

layer83y ago

> difficult to […] comprehend

You’ll still have to fully comprehend the auto-generated regex to make sure that it really does what you want. So the tool may help with coming up with a suitable regex, but doesn’t remove the need for comprehension.

PenguinRevolver3y ago

Tried RegEx → English:

    \"object\": \[(.*?)\]
    ----
    The regular expression matches a string that contains "object": followed by a space and an open bracket, then any characters, then a close bracket. The characters between the open and close bracket are captured in a group.

Pretty cool.

anyfactor3y ago

As a web scraper, thank god for .*? or to be exact [\s\S]*?

Does 90% of what I need.

rcshubhadeep3y ago

Did a hobby project a while ago, just the reverse. Wrote a blog post - https://medium.com/codist-ai/generating-natural-language-des... (colab - https://colab.research.google.com/drive/1QibOifIJQB2tfLyy_mm...)

Also gave a pycon talk - https://www.youtube.com/watch?v=Zugbqg9HFHQ

It was fun, achieved good result but did not need a monster like GPT3!

ramigb3y ago

Why do I have to sign up?

spaniard892773y ago

Mmm, it produces an expression matching the input? You should put some examples. I'm a noob but AFAIK there are different regex engines out there, which one is this output? I remember trying other regex generators in knime and not working.

mouzogu3y ago

I definitely need something like this. My brain seems to insta-delete all Regex knowledge after 1 week.

However, why do I need to sign up to test it. I'm guessing there is some paywall after. This feels like that github co-pilot rugpull.

sourabhv3y ago

Switch from "regex to english" and write "email".

Instead of saying it will match the string "email", it says the opposite, english -> regex conversion

> regex = /\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i

> This regular expression is used to match an email address. It starts by matching any character that is not a space or the "@" symbol, followed by the "@" symbol, then any characters that are not a space or the "." symbol, followed by a "." character, and finally any characters that are not a space.

tragomaskhalos3y ago

"Something I can use to grep the word list on my Mac to cheat at Worldle. My remaining letters are ...., my green letters thus far are .... and my yellows are ..."

yoav_lavi3y ago

I created Melody (https://github.com/yoav-lavi/melody) for the same reasons you've mentioned in the website. I knew AI would try to replace me one day, but I didn't think it'd be so soon.

Kidding of course, looks really cool!

amrrs3y ago

Here's how you can do it without GPT-3 (using BLOOM an opensource alternative) https://twitter.com/1littlecoder/status/1545818058153140224

MiddleMan53y ago

I always imagined having a tool like this! Really psyched to have something to play with!

memorable3y ago

A tool like this really cuts the time!

napier3y ago

This is the kind of use case I’d like to see more of! Less of the p-zombie copywriting neoplagiarism services with pot-luck output, more GPT-as-backend functional apps as productivity multipliers, if that makes sense.

texaslonghorn53y ago

I typed in

> Email

> English → RegEx

> 95

> GO

> \w+@\w+\.\w+

That's an interesting email regex.

iforgotpassword3y ago

That's not that wrong though.

What does it do for queries like "all male English names" or "comfortable temperature range"?

memorable3y ago

For "comfortable temperature range", it generates:

  (\d+\.?\d*)\s?-\s?(\d+\.?\d*)

And for "all male English names", it generates:

  [A-Z][a-z]+

The first one might be good, but the latter seems rather unsophisticated.

1 more reply

tgv3y ago

I typed "an email address" and it came up with

    [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

which is similar, but has some interesting differences. That shows it's black-box association. Then I tried "my email address" and this came out, line break included

   is john.doe@example.com
   My email address is \w+\.\w+@\w+\.\w+

"An identifier" vs "a Javascript identifier" does work as expected, but "a number" and "a floating point number" don't. "A quoted string" doesn't escape the quotes inside, but if you add "with escaped quotes" it does.

So, it's cute, and might set you on the right track, as long as you study the output a bit.

emilfihlman3y ago

>All words starting with the letter "n" and ending with "g", case insensitive >> /^nw*g$/i

The example regex is completely wrong, it should be /n[a-z]*g/ig

skyfalldev3y ago

You also can use this for general GPT-3 queries, which is quite cool!

adamwintle3y ago

How did you do that?

irony1233y ago

Excellent. Worked for me. The output can modified but this provide a great start. I do regex so seldom I can't remember notation for lookahead for example which this provides.

revskill3y ago

Hmmm, it might works, but need a unit test to verify.

Terry_Roll3y ago

Different programming languages use different forms of RegEx, it doesnt suggest it caters for this difference.

ricardobayes3y ago

If we only could get safari to support lookbehind regex that would be nice.

ggurface3y ago

Doesn't work for me.

> any word that has a penultimate character of a > .*[a-z]n$

mutant3y ago

Just... Learn regex. Jeez.

vladdoster3y ago

Needs a way to delete your account

croes3y ago

That's why disposable email addresses exist

SMAAART3y ago

My prayers have been answered.

bruhhh3y ago

the font is so tiny on the website preview I cant read anything

ViraCz3y ago

damn this was good

j / k navigate · click thread line to collapse

109 comments

Cyberdog3y ago

Haven't tried it since I'd rather not sign up for an account. But regarding this example:

   Minimum eight characters, at least one letter, one number and one special character
    >> ^(?=.*[A-Za-z])(?=.*d)(?=.*[@$!%*#?&])[A-Za-zd@$!%*#?&]{8,}$

asicsp3y ago

    All words starting with the letter "n" and ending with "g", case insensitive
    >> /^nw*g$/i

Yeah, seems like backslash got gobbled up in some sort of parsing before what gets displayed on the site. Should be `\w` here.

Also, the site should specify which regex flavor is being generated, I'd guess JS.

All said, it is good to see tools to help with regex, as long as the solutions will be tested to make sure it fits their particular problem.

Cyberdog3y ago

> Regarding your other point, characters like `*` won't need escaping within character class.

Really? Huh. I've been escaping them anyway, and I think I'll continue to do so just to avoid ambiguity when reading patterns.

4 more replies

mhitza3y ago

That regex doesn't do what the description above it says, with or without a backslash.

1 more reply

thrdbndndn3y ago

No, you don't need to escape these special characters if they're in [].

For JS (which is used here) or Python at least. Some other implementations may vary.

colordrops3y ago

I don't trust regular expressions that I wrote, let alone some doped up parameter sniffing AI.

memorable3y ago

You can always use tools like Regex101[0] to verify if they actually work or not. I have tried a few generated by the AI, and it seems to do the job most of the time.

[0]: https://regex101.com/

croes3y ago

You still could have edge cases you don't want or want

2 more replies

mysterydip3y ago

I think the worst case here is it writing a regex that mostly works but fails for some edge cases that you don't think to test but will encounter in production.

treis3y ago

It'd be cool if it split out a bunch of test cases/examples so you can see what's happening in edge cases.

celticninja3y ago

That's always an issue with regard regardless of who wrote it

1 more reply

wpietri3y ago

funstuff0073y ago

fair enough, but few things are easier to test than a regex.

russfink3y ago

Few things are easier to miss than a string that breaks your (nontrivial) regex.

I’d like to see a mathematical estimate of the number of test strings I should generate given some input regex.

1 more reply

unsignedint3y ago

[1]: https://www.regexbuddy.com/

[2]: https://www.regexmagic.com/

walls3y ago

regex101 does almost all of this but in the browser!

gd3kr3y ago

knolan3y ago

You could probably include a ‘Why do I need to sign up?’ disclaimer.

mkl3y ago

Two of your three examples are incorrect as they're missing backslashes. /^nw*g$/i should be /^n\w*g$/i and (?=.*d) should be (?=.*\d).

chowells3y ago

Is this really a parody, or is it just another example of people not actually reading GPT-3 output carefully enough to notice it's nonsense?

gd3kr3y ago

kelnos3y ago

Was curious to try it, but got turned off when asked to create an account, and didn't bother.

GeneralPie3y ago

How do I delete my account?

polskibus3y ago

I’d bet users would’ve preferred to provide examples of input and output to get the regexp they want , instead of designing it in plain English.

texaslonghorn53y ago

That seems like a computationally challenging problem. To avoid .+ you would have to include non matching examples and then I don't know how similar to the matches / specific those would have to be.

asicsp3y ago

Something like https://regex-generator.olafneumann.org/ ?

asah3y ago

Plain English has the benefit that you can use the microphone/dictation on your cellphone...

wtetzner3y ago

Why would you need to generate a regex on your cell phone?

wruza3y ago

Btw, does anyone know a library/program which can reverse engineer a regex from multiple source strings? E.g.

  14:51 [info] 51 some message
  … more of 51 lines …
  15:22 [error] 24 error!
  … more of 24 lines …

  ^(\d\d:\d\d) \[(info|error)\] (\d+) (.+)$

Or maybe not a regex, but a structured pattern.

manx3y ago

One probably wants to provide a set of matching and a set of non-matching strings. Then the software would output a regex and some edge-case matching strings and non-matching strings.

https://scanftree.com/automata/dfa-union-property

wruza3y ago

ailef3y ago

Technically .* is a valid regex for those strings, so the issue here is not only to reverse engineer them, but to do so in a way that's meaningful for the person who has to use it after.

It shouldn't be hard to start with .* and resursively split it in two parts that still match the input strings, but I believe you will end up with matching but useless regexes.

Banana6993y ago

There's research [5][6] as well as practical tools [7][8][9].

[1] https://en.wikipedia.org/wiki/Program_synthesis

[2] https://www.microsoft.com/en-us/research/project/program-syn...

[3] https://dl.acm.org/doi/10.1145/1836089.1836091

[4] https://royalsocietypublishing.org/doi/10.1098/rsta.2015.040...

[5] https://cs.stanford.edu/~minalee/pdf/gpce2016-alpharegex.pdf

[6] https://www.researchgate.net/publication/261794574_Automatic...

[7] https://regex-generator.olafneumann.org/

[8] http://regex.inginf.units.it/extract/

[9] https://stackoverflow.com/questions/6219790/need-a-regex-too...

amake3y ago

The closest thing I know of to this is https://github.com/devongovett/regexgen (or my Ruby port https://github.com/amake/regexgen-ruby).

    % bundle exec bin/regexgen '14:51 [info] 51 some message' '15:22 [error] 24 error!'
    (?-mix:1(?:4:51\ \[info\]\ 51\ some\ message|5:22\ \[error\]\ 24\ error!))

eurasiantiger3y ago

junon3y ago

I know this comment isn't helpful on its own, but yes this exists. I've seen it before. I just have no idea what it was called or how to find it again.

EDIT: Ah no, sorry. Was thinking of the other way around[0].

0: https://www.npmjs.com/package/regex-to-strings

nerdponx3y ago

RegexBuddy has some limited ability to do this, and the author of that program has a whole separate program called RegexMagic that I believe specializes in exactly this.

tomerv3y ago

http://regex.inginf.units.it/extract/

georgia_peach3y ago

In *nix land, a decent reference is built-in: `man awk` and jump to the "Regular Expressions" section.

jwilk3y ago

There are many awk implementations, so whether that reference is decent or not may vary. It looks like on BSDs, there's so such section at all.

georgia_peach3y ago

Tested on Linux, OpenBSD, & Mac:

POSIX: man 7 re_format

PCRE: man pcrepattern

PCRE2: man pcre2pattern

PCRE may be a "works on my machine" thing, but POSIX should be there.

gnulinux3y ago

    All words starting with the letter "n" and ending with "g", case insensitive
    >> /^nw*g$/i

This is not right. That should be a "\w" instead of "w".

curiousgal3y ago

Same with the last one, it should be \d instead of d. Seems like an escape character issue.

emilfihlman3y ago

It's otherwise wrong, too. It shouldn't have ^ or $, and \w matches non word characters. It should be /n[a-z]*g/gi

memorable3y ago

I have played with this for a while, and here are some prompts that might be interesting.

Prompt: "URL regex"

  ^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$

Prompt: "Email regex"

  /^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))

This one's weirdly sophisticated for some reason.

Prompt: "An HTML tag with a close tag and attributes"

  <([a-zA-Z]+)[^>]*?(?<!\/)>.*?<\/\1>

valvar3y ago

Email addresses are notoriously hard to parse.[0]

I'd wager that one probably misses a bunch of corner cases.

[0] https://www.regular-expressions.info/email.html

tragomaskhalos3y ago

mdaniel3y ago

> Prompt: "An HTML tag with a close tag and attributes"

Obviously trained on a data set that didn't include <h1>

This whole thread is why I'm not scared of Copilot or similar taking away jobs anytime soon, since bug fixing is way harder than writing code

kseistrup3y ago

Also regex-related: Pomsky: https://pomsky-lang.org/ (formerly Rulex)

nerdponx3y ago

I've seen a handful of libraries and DSLs that are intended to replace regex now. It might be interesting to compile a list of them and attempt to compare them.

layer83y ago

> difficult to […] comprehend

PenguinRevolver3y ago

Tried RegEx → English:

    \"object\": \[(.*?)\]
    ----
    The regular expression matches a string that contains "object": followed by a space and an open bracket, then any characters, then a close bracket. The characters between the open and close bracket are captured in a group.

Pretty cool.

anyfactor3y ago

As a web scraper, thank god for .*? or to be exact [\s\S]*?

Does 90% of what I need.

rcshubhadeep3y ago

Also gave a pycon talk - https://www.youtube.com/watch?v=Zugbqg9HFHQ

It was fun, achieved good result but did not need a monster like GPT3!

ramigb3y ago

Why do I have to sign up?

spaniard892773y ago

mouzogu3y ago

I definitely need something like this. My brain seems to insta-delete all Regex knowledge after 1 week.

However, why do I need to sign up to test it. I'm guessing there is some paywall after. This feels like that github co-pilot rugpull.

sourabhv3y ago

Switch from "regex to english" and write "email".

Instead of saying it will match the string "email", it says the opposite, english -> regex conversion

> regex = /\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i

tragomaskhalos3y ago

"Something I can use to grep the word list on my Mac to cheat at Worldle. My remaining letters are ...., my green letters thus far are .... and my yellows are ..."

yoav_lavi3y ago

I created Melody (https://github.com/yoav-lavi/melody) for the same reasons you've mentioned in the website. I knew AI would try to replace me one day, but I didn't think it'd be so soon.

Kidding of course, looks really cool!

amrrs3y ago

Here's how you can do it without GPT-3 (using BLOOM an opensource alternative) https://twitter.com/1littlecoder/status/1545818058153140224

MiddleMan53y ago

I always imagined having a tool like this! Really psyched to have something to play with!

memorable3y ago

A tool like this really cuts the time!

napier3y ago

texaslonghorn53y ago

I typed in

> Email

> English → RegEx

> 95

> GO

> \w+@\w+\.\w+

That's an interesting email regex.

iforgotpassword3y ago

That's not that wrong though.

What does it do for queries like "all male English names" or "comfortable temperature range"?

memorable3y ago

For "comfortable temperature range", it generates:

  (\d+\.?\d*)\s?-\s?(\d+\.?\d*)

And for "all male English names", it generates:

  [A-Z][a-z]+

The first one might be good, but the latter seems rather unsophisticated.

1 more reply

tgv3y ago

I typed "an email address" and it came up with

    [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

which is similar, but has some interesting differences. That shows it's black-box association. Then I tried "my email address" and this came out, line break included

   is john.doe@example.com
   My email address is \w+\.\w+@\w+\.\w+

So, it's cute, and might set you on the right track, as long as you study the output a bit.

emilfihlman3y ago

>All words starting with the letter "n" and ending with "g", case insensitive >> /^nw*g$/i

The example regex is completely wrong, it should be /n[a-z]*g/ig

skyfalldev3y ago

You also can use this for general GPT-3 queries, which is quite cool!

adamwintle3y ago

How did you do that?

irony1233y ago

Excellent. Worked for me. The output can modified but this provide a great start. I do regex so seldom I can't remember notation for lookahead for example which this provides.

revskill3y ago

Hmmm, it might works, but need a unit test to verify.

Terry_Roll3y ago

Different programming languages use different forms of RegEx, it doesnt suggest it caters for this difference.

ricardobayes3y ago

If we only could get safari to support lookbehind regex that would be nice.

ggurface3y ago

Doesn't work for me.

> any word that has a penultimate character of a > .*[a-z]n$

mutant3y ago

Just... Learn regex. Jeez.

vladdoster3y ago