digits: digit: charset "0123456789"
rule: [
thru "$"
some digits
"."
digit
digit
]
parse "$10.00" rule ;; true
pattern: [
some "p"
2 "q" any "q"
]
new-rule: [
2 pattern
]
parse "pqqpqq" new-rule ;; true
Rebol doesn't have regular expressions instead it comes with a parse dialect which is a TDPL - http://en.wikipedia.org/wiki/Top-down_parsing_languageSome parse refs: http://en.wikibooks.org/wiki/REBOL_Programming/Language_Feat... | http://www.rebol.net/wiki/Parse_Project | http://www.rebol.com/r3/docs/concepts/parsing-summary.html
TIL
Although Rebol can be used for programming,
writing functions, and performing processes,
its greatest strength is the ability to
easily create domain-specific languages or
dialects.
— Carl Sassenrath [Rebol author]
https://en.wikipedia.org/wiki/Rebolhttp://reference.wolfram.com/language/ref/StringExpression.h...
Something like that would be
StringExpression[
"$",
Repeated[DigitCharacter],
".",
DigitCharacter,
DigitCharacter
]
or StringExpression[
"$",
Repeated[DigitCharacter],
".",
Repeated[DigitCharacter, {2}],
]
or StringExpression[
"$",
NumberString
]
and the other is StringExpression[
Repeated[
StringExpression[
Repeated["p", {1, Infinity}],
Repeated["q", {2, Infinity}]
],
{2}
]
]
This can be made more concise since StringExpression has an infix form (~~) and Repeated can sometimes be replaced by postfix ..Always, not sometimes. ;-)
'$10.00' ~~ rx{ \$ \d+ \. \d\d };
my $pat = rx{ \p+ \q**2..Inf }; 'pqqpqq' ~~ rx{ <$pat>**2 }
Note that these "regexes" are syntax, not strings, checked and converted in to a hybrid DFA/NFA at compile-time.Regex may be ugly, but you lose something important when you move from declarative to imperative.
I've "learned" regular expressions multiple times but it just never sticks, I have no idea why. It certainly doesn't help that there are several different incompatible syntaxes (so what I remember and think "should" work doesn't).
I'd prefer to write RegX's in this style, however I would pay attention to performance (not that Regular Expressions are high performance, however I wouldn't want to see a large performance loss either).
Modern regular expression engines in a lot of languages, actually go beyond the expressiveness of a regular language. This is what damages performance.
There is no reason why this would reduce performance... if its not doing anything crazy.
If anything your taking work away from it. Your building the tree directly here, where as parser would normally build a tree from the string. But since this is integrating into the languages RE library i'm guessing its writing that tree as a string, which is then passed into the regular expression engine, to be turned into a tree again :)
If a regular expression runs too often, even pre-compiled (as they should be), you'll want to replace them with code written in the native language. I've gone in and replaced a one line search/replace written in RegX (compiled), with just a C-style for() loop over the wchar array, and had the memory usage drop by near 80% and performance increase by over 60%.
So high performance is all relative. However RegX isn't something I'd describe that way, even compiled. It is a nice way to write complex string parsing code quickly however.
As the name suggests though, the focus was on passphrase criteria and it wasn't to produce a DSL for general regex building. The library also supports named templates and a few utility methods.
As for syntax, there's the fluent syntax (chained methods), and there's the query syntax which is syntactic sugar that gets compiled to the methods. The query syntax is probably the biggest reason people mistake LINQ for being SQL specific since it resembles SQL.
E.g.,
var results = SomeCollection.Where(c => c.SomeProperty < 10)
.Select(c => new { c.SomeProperty, c.OtherProperty });
The same thing in query syntax: var results = from c in SomeCollection
where c.SomeProperty < 10
select new { c.SomeProperty, c.OtherProperty };
Then you can iterate over both the same way: foreach (var result in results)
{
Console.WriteLine(result);
}``` (?xi) \b ( # Capture 1: entire matched URL (?: [a-z][\w-]+: # URL protocol and colon (?: /{1,3} # 1-3 slashes | # or [a-z0-9%] # Single letter or digit or '%' # (Trying not to match e.g. "URI::Escape") ) | # or www\d{0,3}[.] # "www.", "www1.", "www2." … "www999." | # or [a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash ) (?: # One or more: [^\s()<>]+ # Run of non-space, non-()<> | # or \(([^\s()<>]+|(\([^\s()<>]+\)))\) # balanced parens, up to 2 levels )+ (?: # End with: \(([^\s()<>]+|(\([^\s()<>]+\)))\) # balanced parens, up to 2 levels | # or [^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars ) ) ```
/{1,3} # 1-3 slashes
| # or
[a-z0-9%] # Single letter or digit or "%";https://www.debuggex.com/r/EpocMU_7Fq_B_p9z
edit:
wait, I thought about it for a second and I see what you meant. You're not saying it's wrong, you're saying it's obvious.
I wasn't sure if it was obvious because I wasn't sure if {1,3} was supposed to be {1-3} and there was a mistake in the expression, or if there was some kind of unexpected error in the [a-z0-9%] expression.
Because even in this simple example, there is room for error.
(?xi)
\b
( # Capture 1: entire matched URL
(?:
[a-z][\w-]+: # URL protocol and colon
(?:
/{1,3} # 1-3 slashes
| # or
[a-z0-9%] # Single letter or digit or '%'
# (Trying not to match e.g. "URI::Escape")
)
| # or
www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
| # or
[a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
)+
(?: # End with:
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
| # or
[^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars
)
)https://github.com/perl6-community-modules/uri/blob/master/l...
e.g.
(: (or (in ("az")) (in ("AZ")))
(* (uncase (in ("az09")))))Look, I know it takes a while, but once you get the hang of it, you won't need any crutches to write regular expressions. The only tool that's really needed is a way to rigorously test a regular expression to make sure it does what it needs to do and there are a ton of those around.
I see regex like that: if you have to use it often enough, better to learn it as it is - will be more helpful in the long run. If you don't use regex too often then just google your question - there's a very high chance that somebody already wrote regex for your or similar problem.
Only tools I ever use are regex testers (like regexr.com) when I need to make sure that pattern works correctly.
While I prefer writing regexes, a regex DSL isn't fundamentally better or worse, just different. In addition, it allows non-computer people to write, or at least specify, regexes in a way that makes more sense to non-developers.
The particular syntax we use (which is not that great) is not THE "regular expressions" is just one syntax we arrived at.
That is, the "regular expressions" name doesn't refer to the syntax, but to the concept.
These web based tools can do it:
https://www.debuggex.com/r/Yxqws81Uif-BGBN8
Important note - this is built up programmatically, it's not just a string dumped in a parser!
I get that some people have a hard time understanding regexpes with all the backtracking and greediness. Yes, syntax is a bit complicated. Maybe simplified predictable default mode could help. But there is no problem with DSL being used as an abstraction. In fact, we need more DSLs, for everything!
(compound "$" (1+ :digit) "." :digit :digit)
Run: $ txr -p "(regex-compile '(compound \"$\" (1+ :digit) \".\" :digit :digit))"
#/$\d+\.\d\d/