Minimum eight characters, at least one letter, one number and one special character
>> ^(?=.*[A-Za-z])(?=.*d)(?=.*[@$!%*#?&])[A-Za-zd@$!%*#?&]{8,}$
I assume the "d" characters here are supposed to be "\d" but the backslashes wandered off somewhere. Not sure if that's just in the example or the actual output from the AI. Some of those special characters like "*" and "$" need a backslash too.Also, this pattern will not allow any character which is not a letter, number, or in that very finite list of special characters - so no spaces, no carets, no quote marks or apostrophes, etc. Not allowing certain printable characters in a password is a bad practice (I won't complain too much if you forbid the non-printable ones).
All words starting with the letter "n" and ending with "g", case insensitive
>> /^nw*g$/i
Yeah, seems like backslash got gobbled up in some sort of parsing before what gets displayed on the site. Should be `\w` here.Regarding your other point, characters like `*` won't need escaping within character class. Not sure why only those particular set of characters are deemed "special characters" - why not `-`, `=`, parentheses, curly braces, etc?
Also, the site should specify which regex flavor is being generated, I'd guess JS.
All said, it is good to see tools to help with regex, as long as the solutions will be tested to make sure it fits their particular problem.
Really? Huh. I've been escaping them anyway, and I think I'll continue to do so just to avoid ambiguity when reading patterns.
I was about to end this post with "an extra backslash never hurt anybody" but then I realized that this is HN and writing that would surely prompt someone to reply with some bizarre case where an extra backslash caused a cascade of failures which cost a company fifteen million dollars and/or killed 43 people.
For JS (which is used here) or Python at least. Some other implementations may vary.
So to me this mainly looks like a way for people who don't understand something to put that ignorance into the codebase, setting traps for colleagues down the road. That's not a new experience for me, but this does seem likely to make that easier and more fun, two things I don't think dangerous code needs.
I’d like to see a mathematical estimate of the number of test strings I should generate given some input regex.
The same author also make variant called RegexMagic [2] which probably would have closer premise with AutoRegex (less NN part, perhaps) as it is designed to make Regex without too much knowledge of regex, but I don't know how well it works as I haven't used it much...
Is this really a parody, or is it just another example of people not actually reading GPT-3 output carefully enough to notice it's nonsense?
14:51 [info] 51 some message
… more of 51 lines …
15:22 [error] 24 error!
… more of 24 lines …
^(\d\d:\d\d) \[(info|error)\] (\d+) (.+)$
Or maybe not a regex, but a structured pattern.This could be built using set operations on deterministic finite automata (dfa). Every regex is equivalent to a dfa. You can now construct automata for every positive and negative example input. Then calculate the union for all positive examples and the union for all negative examples. And finally calculate the difference between the two unions. Convert the resulting automaton back to regex.
It shouldn't be hard to start with .* and resursively split it in two parts that still match the input strings, but I believe you will end up with matching but useless regexes.
There's research [5][6] as well as practical tools [7][8][9].
[1] https://en.wikipedia.org/wiki/Program_synthesis
[2] https://www.microsoft.com/en-us/research/project/program-syn...
[3] https://dl.acm.org/doi/10.1145/1836089.1836091
[4] https://royalsocietypublishing.org/doi/10.1098/rsta.2015.040...
[5] https://cs.stanford.edu/~minalee/pdf/gpce2016-alpharegex.pdf
[6] https://www.researchgate.net/publication/261794574_Automatic...
[7] https://regex-generator.olafneumann.org/
[8] http://regex.inginf.units.it/extract/
[9] https://stackoverflow.com/questions/6219790/need-a-regex-too...
% bundle exec bin/regexgen '14:51 [info] 51 some message' '15:22 [error] 24 error!'
(?-mix:1(?:4:51\ \[info\]\ 51\ some\ message|5:22\ \[error\]\ 24\ error!))
With enough inputs it should end up with something somewhat reasonable for the leading part, but it will never be smart enough to understand that the error message is "arbitrary" and should be matched with e.g. `(.+)`.EDIT: Ah no, sorry. Was thinking of the other way around[0].
In *nix land, a decent reference is built-in: `man awk` and jump to the "Regular Expressions" section.
POSIX: man 7 re_format
PCRE: man pcrepattern
PCRE2: man pcre2pattern
PCRE may be a "works on my machine" thing, but POSIX should be there.
All words starting with the letter "n" and ending with "g", case insensitive
>> /^nw*g$/i
This is not right. That should be a "\w" instead of "w".Prompt: "URL regex"
^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$
Prompt: "Email regex" /^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))
This one's weirdly sophisticated for some reason.Prompt: "An HTML tag with a close tag and attributes"
<([a-zA-Z]+)[^>]*?(?<!\/)>.*?<\/\1>I'd wager that one probably misses a bunch of corner cases.
Obviously trained on a data set that didn't include <h1>
This whole thread is why I'm not scared of Copilot or similar taking away jobs anytime soon, since bug fixing is way harder than writing code
You’ll still have to fully comprehend the auto-generated regex to make sure that it really does what you want. So the tool may help with coming up with a suitable regex, but doesn’t remove the need for comprehension.
\"object\": \[(.*?)\]
----
The regular expression matches a string that contains "object": followed by a space and an open bracket, then any characters, then a close bracket. The characters between the open and close bracket are captured in a group.
Pretty cool.Does 90% of what I need.
Also gave a pycon talk - https://www.youtube.com/watch?v=Zugbqg9HFHQ
It was fun, achieved good result but did not need a monster like GPT3!
However, why do I need to sign up to test it. I'm guessing there is some paywall after. This feels like that github co-pilot rugpull.
Instead of saying it will match the string "email", it says the opposite, english -> regex conversion
> regex = /\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i
> This regular expression is used to match an email address. It starts by matching any character that is not a space or the "@" symbol, followed by the "@" symbol, then any characters that are not a space or the "." symbol, followed by a "." character, and finally any characters that are not a space.
Kidding of course, looks really cool!
> English → RegEx
> 95
> GO
> \w+@\w+\.\w+
That's an interesting email regex.
What does it do for queries like "all male English names" or "comfortable temperature range"?
(\d+\.?\d*)\s?-\s?(\d+\.?\d*)
And for "all male English names", it generates: [A-Z][a-z]+
The first one might be good, but the latter seems rather unsophisticated. [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
which is similar, but has some interesting differences. That shows it's black-box association. Then I tried "my email address" and this came out, line break included is john.doe@example.com
My email address is \w+\.\w+@\w+\.\w+
"An identifier" vs "a Javascript identifier" does work as expected, but "a number" and "a floating point number" don't. "A quoted string" doesn't escape the quotes inside, but if you add "with escaped quotes" it does.So, it's cute, and might set you on the right track, as long as you study the output a bit.
The example regex is completely wrong, it should be /n[a-z]*g/ig
> any word that has a penultimate character of a > .*[a-z]n$