Retiring a Great Interview Problem (opens in new tab)

(thenoisychannel.com)

173 pointsdtunkelang14y ago117 comments

117 comments

Why retire the question just because you saw it on Glassdoor? The same question was posted to sureinterview last year, so anyone you hired since November is suspect! (If you believe that potential advance knowledge of the question is relevant).

Worse yet, even if you did come up with this problem on your own, this exact problem was a fairly common interview question back in the mid 90s, when string processing interview questions were all the rage for C/C++ programming jobs -- I must have seen it a dozen times in various interviews over the years. The problem is familiar enough that I'd bet a decent amount of money that it must be listed in one of those interview problem books that were popular before the websites for this stuff started showing up over the past couple years... If you really relied on the interviewee having no possible advance knowledge of this question prior to the interview you surely had a false sense of security prior to seeing it appear on glassdoor.

As long as you engage the interviewee to see that s/he really understands the answers they are giving, I don't really see why it matters if the question has appeared on one of these sites. If the interviewee is preparing for their interviews enough that they are actually looking at these sites and understanding the answers to the point where they can intelligently discuss them, that probably already puts them in the upper tier of people you'd want to seriously consider hiring, so retiring the question is probably counter-productive unless you have a non-disclosed alternative that you're sure is as good of a question.

patio1114y ago

A spiritually similar question at a previous employer resulted in many candidates attempting to iterate over the dictionary rather than iterating over the string. We hired them. At least they could iterate over a dictionary. That's a surprisingly rare skill in the hiring pool.

Maybe I'm just getting cynical in my old age, but sometimes I think the world is awash in incompetence. We see so much of it in tech because our incompetencies are (marginally) harder to hide.

dxbydt14y ago

>sometimes I think the world is awash in incompetence. We see so much of it in tech because our incompetencies are (marginally) harder to hide.

A very candid Philip Greenspun in Founders@Work( p341-2) : "In aviation, by the time someone's your employee, their perceived skill and actual skill are reasonably in line. JFK Jr is not working at a charter operation because he's dead. In IT,you have people who think 'I am a great programmer'. But where are the metrics that prove them wrong ? Traffic accidents are very infrequent, so they don't get the feedback that they are a terrible driver.

Programmers walk around with a huge overestimate of their capabilities. That's why a lot of them are very bitter. They sit around stewing at their desks. That's why I don't miss IT, because programmers are very unlikable people. They are not pleasant to manage."

nasmorn14y ago

I second that. For musicians it is hard to hide their incompetence, for programmers it is just a matter of picking the right succession of jobs.

onemoreact14y ago

I like that you accepted new approaches as valid even if a candidate did not finish them. Reading this I can’t help but think how this person falls into the java trap of trying to make a ridiculously generic solution which I would consider a danger sign but plenty of people love to see.

So, assuming a real world input and real world dictionary you can try plenty of things that break down for a dictionary that includes four hundred A’s but are actually valid solutions. Also, if you want a fast real world solution then sticking with a pure lookup dictionary would slow things down. Being able to toss 5 letters into a lookup table that says the longest and shortest words that start with those letters would save a lot of time. Basically, ‘xxyzy’ = null, null saves you from looking up x to the maximum word length in your dictionary. Secondly sanitizing inputs is going to be both safer and faster. And anything you are coding in 45 minutes is going to be a long way from production ready code.

PS: Even with backtracking you can code an O(n) solution for vary large input sets. Just keep track of the K best output sequences that are aka N, N-1,N-2,…N-K.

arethuza14y ago

Many years ago ('97) we had a Java programming test for new hires that included a question that involved iterating over a Hashtable containing String keys and values and printing them all out.

Nobody ever answered that question correctly and I used to joke that if anyone ever did we'd hire them immediately.

NB I've long since stopped using those kind of trivia questions in interviewing, but I didn't know any better at the time.

StavrosK14y ago

I'm sorry, my Java is rusty; is this the equivalent of the Python

    print("\n".join("%s: %s" % (key, value) for key, value in mydict.items()))

? If so, is it hard to do in Java?

nitrogen14y ago

In 1997, before for-each and generics, it would've been very verbose. I learned Java in 2006 and haven't written much in a while, so I maybe slightly off, but I think you'd either get an Iterator for the Hashtable's keys and repeatedly call Iterator.next() and Hashtable.get(), or get the length of Hashtable.keys() and use a traditional C-style for loop.

Okay, I couldn't resist looking up the old docs, so this looks about right (Enumeration instead of Iterator -- two interfaces that do basically the same thing):

  void printTable(Hashtable table)
  {
    Enumeration e = table.keys();
    while(e.hasMoreElements()) {
      String key = (String)e.nextElement();
      String value = (String)table.get(key);
      System.out.println(key + ": " + value);
    }
  }

3 more replies

SoftwareMaven14y ago

Well, yours does iterate over the dictionary twice and generates a temporary list in the meantime. If you have a huge dictionary, this would explode. The equivalent Java code wouldn't do that (due to lack of list comprehensions).

3 more replies

arethuza14y ago

No, it's really not - even with the java.util.Hashtable class that was all that Java had at the time. I think it was something that people from a C,C++ background didn't really expect to be able to do.

nopassrecover14y ago

Iterating over a hashtable (at least in pseudocode if they didn't recall the API functions off-hand) hardly seems a trivia question...

pbh14y ago

I hope readers aren't getting the impression from this article that the code examples provided are the correct way to do word segmentation in English. (Though I understand this is an article about interviewing and not about word segmentation. And this might be considered a preprocessing step for doing things correctly...)

Norvig gives a very approachable version of English word segmentation that uses a language model below.

http://norvig.com/ngrams/

dtunkelangOP14y ago

Peter actually emailed me directly, though I was already very familiar with his work on this and similar problems. But yes, I make it very clear in the post (and to candidates) that they should not assume an English -- or even a natural language -- dictionary.

JL201014y ago

I had a similar question with a twist asked of me during an interview. It went something like this:

Given a list of all the short strings in the periodic table of elements (e.g. Na, F, Al, etc) and a list of all the words in the English language: 1) write a method that finds the longest possible English word you can spell given any combination of the strings in the periodic table of elements. Re-usage of elements in the same string are allowed. 2) Describe what kind of data types you would want for the two lists and describe anything special about them. 3) Give a big O estimation.

I thought it was a great question :)

tzs14y ago

I would have thought of dynamic programming when asked if I could solve it more efficiently, but I've somehow just not had much occasion to use dynamic programming, so would have had to admit it would take an evening of reviewing my algorithms books to actually solve it that way.

However, I would have been able to come up with an alternative O(n^2) solution, involving building a directed graph with vertices representing each position in the string and the end of the string, with edges connecting two vertices if there is a dictionary word starting at the position corresponding to one vertex and ending just before the position corresponding to the second vertex. This can be done in O(n^2), and then you can find the shortest path from the vertex for 0 to the end vertex on this graph in O(n^2) (e.g., by Dijkstra's algorithm), and that gives you an exact covering of the string using the minimal number of dictionary words.

VinyleEm14y ago

Your approach is correct. The graph that you've built is a DAG. This means that, you can solve for the lower vertices before solving for the upper ones. This is exactly how a typical Dynamic Programming solution works. If for solving a problem, we can see the relationship b/w this problem and other similar but smaller ones, we solve the smaller problems first and use these solutions to build up a solution for the larger case.

EDIT: A naive implementation of your idea can take upto O(n3m) time where m is the no. of words in dictionary. Using a fancy data structure like a trie or suffix tree or an automaton can speed it upto O(n^2). Can we better that?

dtunkelangOP14y ago

Actually, my original implementation used Dijkstra's algorithm. But I think the linear table of suffixes (or prefixes if you prefer) is a cleaner approach.

StavrosK14y ago

Only slightly joking:

    re.findall("(your|dict|here)", "yourword")

I like the idea of constructing a state machine to do all the matching.

skrebbel14y ago

I love it. You spark an (imho) much more interesting debate: that of raw smartness vs pragmatic productivity.

A colleague of mine (now hired) was asked to code a string compare during her interview for a .NET function. She said "String.Compare()". This puzzled the interviewers for a while. They asked her whether she could write it out, she said she didn't see the point.

They ended up hiring her for other reasons (notably, references from trusted colleagues who had worked with her), but I still wonder whether that attitude worked for her or against her.

someone1314y ago

There's something to be said for NOT reinventing the wheel. Standard libraries are standard for a reason - the "String.Compare()" function is likely faster for just about every case, in addition to being already there.

nandemo14y ago

I keep seeing this argument on threads about interview questions.

The principle of not reinventing the wheel has nothing to do with interview questions. Sure, if you're actually working on solving an actual problem at your job, then most of the time you'll be better off using a standard library function instead of rolling your own.

In the context of the job interview, however, it doesn't matter if the solution to the problem is well-known or whether it can be found in a standard library. The problem of the interviewer is not to find out how to compare strings, it's determining if the candidate will be able to write proper code if hired. And the string-compare question might be a good starting point for evaluating the candidate's level. In any case, the fact that a solution is in a standard library is irrelevant.

If I was the interviewer I'd create a new data structure Foo and then ask the candidate to implement Foo.compare(). The question remains essentially the same, so I wonder what the candidate would reply.

2 more replies

skrebbel14y ago

That's my point. At work, I'd prefer someone using String.Compare() over some nasty hand-crafted for-loop with switches and things for all kinds of collation issues. Why would I ask something else during the interview?

I love the GGP's solution for the same reason. If the regex is compiled only once, it may be only marginally slower (if at all), and it significantly improves readability and maintainability, and tremendously reduces the chances for bugs.

I'd hire the him. I have the impression most Google-interviewers (and similar dudes/dudettes) wouldn't. I wonder why.

2 more replies

rfurmani14y ago

bzzt wrong.

>>> re.findall('(a|aa|aaa|ab)', 'aaab') ['a', 'a', 'a']

The correct answer would be ['aa', 'ab'] but unfortunately findall works greedily and so will not find the optimal solution. It is possible to specify it as a regex, but common implementations might take too much time to come up with the good solution.

StavrosK14y ago

The mistake arises from the different semantics of findall versus what I initially wanted to do. I initially tried "^(dict|here)+$", which should return all the subwords, but doesn't (it only returns the last one).

I'm missing something, clearly, but my initial idea was a state machine that contained all the words (sorted by length, perhaps, but that's not necessary if we don't care about the solution with the "most words") and linked back to itself. It's essentially the same as the backtracking example, from a cursory glance.

Of course, findall is a dumb version that only finds disjoint substrings.

EDIT: It turns out that the group count in regexps is fixed, so it's not possible to return all matches of a single group, even if it's repeated. All my solution above does is show you whether there's a match or not, but not what it is (unless it's the single word).

qntm14y ago

Interestingly, the article explicitly states that our dictionary supports only the exact string lookup command, dict.contains(string). Strictly speaking, the full content of the dictionary isn't available to us, and we can't create the regular expression.

kevinastock14y ago

From the problem definition:

Q: What if there are multiple valid segmentations? A: Just return any valid segmentation if there is one.

reidrac14y ago

I think ['a', 'a', 'ab'] is also valid, isn't it? So ['aa', 'ab'] would be one of the correct answers.

(of course ['a', 'a', 'a'] it's not because the 'b' is not a valid word and it's not in the solutions)

rfurmani14y ago

Ah right, I was a bit too quick to challenge. Dictionary: aa, aaa, aaaa, ab Word: aaaab

kragen14y ago

There's a library for constructing a deterministic regular expression from a dictionary, by the way, which would right away give you the exponential-time result. If you wrapped it in a * and applied it with a guaranteed-linear-time regular-expression engine like RE2, you'd find out whether the string was segmentable (and as a bonus you wouldn't have to construct the deterministic RE yourself) but I don't know if you'd get the actual segments.

qntm14y ago

This was actually going to be my solution as well. Regex engines are optimised for exactly this sort of problem, why duplicate the work manually?

saucetenuto14y ago

rfumani gave one reason: it won't produce the right answer. Here's another: your regex engine will probably fall over dead if you ask it apply a million-word alternation.

eru14y ago

grep could be fine. I wouldn't trust backtracking regex engines, though.

pjscott14y ago

Let's try it:

    >>> re.compile('|'.join(open('/usr/share/dict/words').read().split()), re.I)
    OverflowError: regular expression code size limit exceeded

For smaller dicts, it should work fine, but it evidently doesn't do so well on larger ones.

nandemo14y ago

Just for fun I decided to rewrite his first version in Haskell. This is probably not idiomatic, though.

  segment_string :: String -> Set String -> Maybe String
  segment_string [] _ = Nothing
  segment_string str dict =
    if str `member` dict
    then Just str
    else let pairs = zip (inits str) (tails str)
             pairInDict (x, y) = x `member` dict && y `member` dict in
         do (x, y) <- find pairInDict pairs
   　        Just (x ++ " " ++ y)

shangaslammi14y ago

Here's my Haskell attempt (both naive recursive backtrack and dynamic programming version):

    type Dict = Set String

    hasWord :: Dict -> String -> Bool
    hasWord = flip Set.member

    fmtWords :: [[String]] -> Maybe String
    fmtWords = fmap (intercalate " ") . listToMaybe 

    splitWordsSimple :: Dict -> String -> Maybe String
    splitWordsSimple dct = fmtWords . go where
        go [] = return []
        go s = do
            (i,t) <- zip (inits s) (tails s)
            guard $ hasWord dct i
            (i:) <$> go t

    splitWordsDP :: Dict -> String -> Maybe String
    splitWordsDP dct s = fmtWords $ a ! 0 where
        a = listArray (0, len) $ map f [0..len-1] ++ [[[]]]
        len = length s
        f i = do
            l <- [1..len-i]
            let w = take l $ drop i s
            guard $ hasWord dct w
            (w:) <$> a ! (i+l)

ottbot14y ago

http://www.codinghorror.com/blog/2007/02/fizzbuzz-the-progra...

nandemo14y ago

So what? This problem (the full version) is more complex than FizzBuzz. Besides, I'm not pasting code here to prove that I can program, but to show how the solution looks like in Haskell.

eru14y ago

Yes, doesn't strike me as idiomatic. But I can't come up with a better version on the spot.

zohebv14y ago

How about this, this will compute more than pairs

let dict = Data.Set.fromList["apple", "pie", "bread", "applepie", "piebread"]

let fn q = concat.Data.List.map (\(x,y) -> if (member x dict) then if y == "" then [[x]] else Data.List.map (x:) (fn y) else [] ) $ tail $ zip (inits q) (tails q)

fn "applepiebread" [["apple","pie","bread"],["apple","piebread"],["applepie","bread"]]

</code></pre>

wensing14y ago

Humbling way to start the work week. I could produce the fizzbuzz solution and in my sharper days the recursive backtracing one, but definitely no further.

SoftwareMaven14y ago

I wouldn't feel bad. The advanced answers to this question require spending a lot of time understanding string processing. It's like having a CSS question that can be implemented multiple ways: a simple, obvious, slow way and a complicated, "deep knowledge required", fast way. If you have lots of experience with CSS, you might get the fast way, but it doesn't really say how good a programmer you are.

(Yes, not a perfect analogy, but it hopefully gives the idea.)

kragen14y ago

> The advanced answers to this question require spending a lot of time understanding string processing.

No, the advanced answers are a simple application of dynamic programming. If you've never heard of dynamic programming before, you're unlikely to invent it in response to an interview question, of course; but if you have heard of it, it might occur to you to try it on this problem.

(Actually, if you've heard of memoization but not dynamic programming, you might invent dynamic programming in response to this question.)

I think this is at the opposite end of the spectrum from your CSS example. Dynamic programming has nothing to do with string processing or with any other particular domain. There's a list of 29 significant algorithms that apply it at http://en.wikipedia.org/wiki/Dynamic_programming#Algorithms_.... It might qualify as "deep knowledge", but it's not deep domain knowledge; it's the kind of deep knowledge that would make you want to hire someone from a different domain.

onemoreact14y ago

Counterpoint: @Retric congratulations, you’ve reinvented dynamic programming with your O(n) solution That’d be a perfect solution.

And yes, I had zero idea what memorization or dynamic programming meant in that context. After looking it up on wikipedia it seems to mean caching intermediate steps to avoid recalculating them which seems obvious enough.

mason5514y ago

The advanced answers to this question require spending a lot of time understanding string processing.

Really? Seems like a standard DP question to me

btilly14y ago

Indeed. The only detail that surprised me was that you could only do exact lookups in the dictionary. That makes it O(n*n). If you had the dictionary stored in a trie it would be O(n) on long strings. (With a maximum constant whose size depends on the length of the longest word in the dictionary.)

1 more reply

pkteison14y ago

Bit of a tangent, but a real question: If a writer doesn't approve of a site's behavior (in this case, Glassdoor is desribed as "a site that does not seem to mind when interview candidates violate NDAs"), why does the writer still link to them? Inbound links (w/o a nofollow) help sites, why help sites you don't like?

dtunkelangOP14y ago

Point taken. I thought of not linking to them, or even not mentioning them. But they do play an important part in the story of the post, and I don't think I'm helping them so much as I'm raising employees' and employers' awareness of their existence. At least now a few more interviewers might check to see if their questions are posted there. From my limited data, interviewees are already more aware of Glassdoor than interviewers.

As for feeding them page rank, I don't think I have so much to offer that it helps them materially.

logic14y ago

> interviewees are already more aware of Glassdoor than interviewers

My first amusing thought when reading this was to assume a correlation; ie. that interviewees who lean heavily on Glassdoor prior to interviewing do not eventually become intervewers.

mynegation14y ago

Quick heads up: page is not rendered properly in Mobile Safari on iPhone: fixed width font lines are cut off.

dtunkelangOP14y ago

Sorry about that. Advice on how to fix it while maintaining good rendering elsewhere? Looks good in Chrome on my MacBook and my Nexus One.

pavel_lishin14y ago

It's not rendering well for me, either: http://i.imgur.com/Q89gP.png

My suggested solution: don't wrap it in PRE tags with manual line-breaks. It's not code, so why preserve the exact breaks? Try BLOCKQUOTE - I don't know if it's widely supported anymore - or just italicize the whole thing.

I don't really have a good solution for what to do about the actual code, though. :(

svdad14y ago

I think the concern about whether or not candidates have seen this, or any, programming question before is missing the point. Think about what we want in the ideal candidate -- we want them to come up with a good (elegant, efficient) solution to the problem, and implement it. We (judging by all the other responses) expect them to do that because they've had a solid CS education (formal or informal) as well as significant experience.

But people with that background will give good answers, even if they haven't seen _this specific problem_, because they have seen lots of problems like it and recognize the pattern. And even in that case, we evaluate them based on how well they can implement the pattern they saw, not just on whether they recognized the correct algorithm. So what if they've seen this problem already? Coding it up efficiently and elegantly in an interview context is still non-trivial, and you can still push them to discuss edge cases and performance tradeoffs.

The person who really has _never_ seen anything like this in his life, and still can give a good answer, I have yet to meet.

Shenglong14y ago

With the exception of the last part of the question, you learn everything there in your first year of CS at university. Do people who can't write this really put the language on their resume?

Can I get some stats? I really don't (want to) believe it. What percentage of people get this question wrong? Are they all some sort of eng/cs graduate? I'm not even a coder and I can solve this in a few minutes.

dtunkelangOP14y ago

I don't have stats I can share, but I assure you that this problem has confounded many interview candidates with strong resumes. I agree with you that it's all basic material -- that's deliberate. I'm glad you think it's too easy. :-)

cema14y ago

Yes, it's basic. But many people who are not fresh out of college may have spent recent years solving a completely different set of programming tasks, and do not have it loaded in their brains.

When applicants prepare for an interview, they do not often know what kind of knowledge to load in their heads. For example, just a couple of weeks ago I was asked to figure out a simple bit-flipping scheme, and bit string manipulations are something that I have not thought about in many years. So it took me about 10 minutes for a problem that I would have done in less than a minute when I was spending time thinking about similar things and my mind was full with them.

Being prepared for a technical interview does not mean to have memorized a few solutions to a few problems, but it means to have played with them sufficiently to have the brain loaded with the material. This helps with intuition as well as specific technical skills.

Shenglong14y ago

You have my sympathy :)

Havoc14y ago

Its a great question. Pretty sure its not nearly as much of a secret as the author thinks though. I've seen a detailed write-up before somewhere (HN?) and I'm not even a programmer by profession.

dtunkelangOP14y ago

Secret is a strong word -- I link to another post on the subject in the article. Still, seeing it on Glassdoor for my own employer crossed the line of disclosure.

ohashi14y ago

I read this and went, I had someone build this for me years ago!

So I looked at the code, it used the efficient solution. Now I am even more impressed with the programmer. I've always thought highly of him (better than me) but it's hard to evaluate someone better than you. His solution ran circles around mine (I had general simple case with 2 words) and now I know exactly how much more efficient his solution was. Very neat.

mak12014y ago

Its a nice Dynamic programming problem. The beauty of DP is that simply memorizing one application of it does not guarantee you a solution to an entirely different problem that might have a similar Dynamic Programming solution. Look over the TopCoder SRM archives if you don't believe me.

So even though you are retiring this one, coming up with something similar that tests for basically the same things shouldn't be impossible.

pavel_lishin14y ago

Quick note to the author - holy god, this is annoying: http://i.imgur.com/0cNx4.png

dtunkelangOP14y ago

Sorry. The post renders well in Chrome on my Mac and on my Nexus One. But apparently not so well in other browser / platform combinations.

wisty14y ago

The problem he's having is that good interview questions are getting busted, as people post solutions on the web.

If you have a lot of similar interview questions, then there's no way anyone other than a savant can memorize them without actually learning the theory.

dtunkelangOP14y ago

Point taken. But it's hard to come up with good interview questions, as my colleagues here, at Google, and at Endeca can attest. In contrast, it's much easier to post solutions.

That's why I'm working on an approach that assumes the candidate does of prior knowledge of the problem. But not there yet.

wisty14y ago

There's not too much difference between "people who get dynamic programming, after boning up for the interview", and "people who can answer your dynamic programming question because they read the answer to a similar one while boning up for the interview". Both are probably good candidates.

jswinghammer14y ago

One thing that I've noticed over the years is that almost no one prepares for an interview in any way so you'll still keep out the worst candidates with this question.

cema14y ago

I think Google etc are exceptions, people do prepare for technical interviews there.

NY_Entrepreneur14y ago

The author just mentioned dynamic programming. Usually in dynamic programming, e.g., as in Dreyfus and Law, to say that a problem has a dynamic programming solution we outline the solution. But the author did not outline such a solution.

An outline usually includes at least the definition of the 'stages' of the dynamic programming solution. For the problem of 'string segmentation', the obvious selection of stages would be each of i = 1, 2, ..., n for the given string of length n. But this definition of the stages does not yield a dynamic program because the solution at stage i needs more than just the solution at stage i + 1 and, indeed, potentially needs the solutions at each of stages i + 1, i + 2, ..., n.

So, first-cut, there is no dynamic programming solution. For a second-cut, there might be a dynamic programming solution if the author would outline one!

There is now some question if the Google interviewers really understand dynamic programming!

rfurmani14y ago

The "obvious" selection of stages does work. When you are at position j you check all i<j until you find one where there is a segmentation up to i, and the substring [i,j) is a word. Memoization is just syntactic (semantic?) sugar on top of this and he provides the code that basically implements the above. In programming competition circles it's common to just say "it's dynamic programming" when the stages are semi-obvious, and given that he uses memoization I'd say there's a strong chance he's been involved in Topcoder

NY_Entrepreneur14y ago

Yes, from a fast reading, your solution appears to be correct, and so does 'memoization', but, still, in spite of common practice in computing, it's not a dynamic programming algorithm because what 'dynamic programming' is goes back to R. Bellman and his student Dreyfus and the book Dreyfus and Law.

There may be a problem with what you outline: That substring [i, j) is a word is not so impressive! In addition we need to know, for i < p < j < q, is [p, q] a word. That is, even if [i,j) is a word, we can't conclude yet that it will be a word in the final, feasible 'segmentation' of the whole string. Indeed, even if [j, n] 'segments', we can't yet conclude that those segments will be in the final, feasible segmentation of the whole string [1, n].

Maybe in the core of the work what we want to know is, for each i = n, n - 1, ..., 1, is there a 'segmentation' of [i,n]. To tell this, at i, consider j so that i < j <= n and ask if [i, j) is a word AND at the same time if [j,n] can be 'segmented'. If so, then say that [i,n] can be segmented. So, in this, first we ask if [j,n] can be segmented since we have already addressed that issue and recorded the result, true or false.

So, the break with dynamic programming is that at stage i, we have to look at the results of not just stage i + 1 but at the results of each j for j = i + 1, i + 2, ..., n, which breaks the basic 'recursion' that is at the core of dynamic programming. That is, a key feature of dynamic programming is that at stage i, to find what to do, need only the 'state' at stage i and, for each possible state at stage i + 1, what to do at stage i + 1. That is, at stage i, do not have to look at stages i + 2, i + 3, etc.

Also for your approach, you are not working just at stage i but also with [j, i) for 1 <= j < i which is again a break with dynamic programming where we regard each i = 1, 2, ..., n as a stage.

No matter what some common practice is, just saying dynamic programming without being explicit about the stages, states, state transition functions, and the recurrence is sloppy to the point of risking often being wrong. This is not nearly the first time that computing took something solid and precise from applied math and made a mess. Messes, even if common, are not good. You will see a lot of care and discipline in Dreyfus and Law along with essentially all the applied math literature on dynamic programming, back to Nemhauser, Bellman, and others.

It's a cute string exercise with some cute solutions, but it's just not dynamic programming.

rfurmani14y ago

I'm confused. Are you saying that if you have a DP table A then to compute A[i+1] you are only allowed to use A[i]? This question is classic DP. A[i] stores the segmentation of the first i letters if it exists (or it can just store a boolean, depending on implementation). Then to compute A[i] you look at the j<i see if A[j] is non-empty and if the [j,i) substring is in the dictionary. If you find a valid j you set A[i] accordingly, otherwise leave it empty.

1 more reply

kragen14y ago

I just put a bottom-up reformulation of the dynamic-programming solution at http://canonical.org/~kragen/sw/inexorable-misc/wordseg.c, in the function "segment". The stages are the prefixes of s: the substrings s[0:0], s[0:1], s[0:2],... s[0:n], where n is the length of s. The state of some stage s[0:i] is a finite map from all of its segmentable prefixes s[0:j] {j∈[0,i)} to segmentations of those prefixes. (Only one segmentation per prefix.) The transition function may leave the state unchanged, or it may add a pair to the map, mapping s[0:i] to some segmentation, which it can do if ∃j:∃w∈dict: (s[0:j] || word) = s[0:i], in which case the segmentation is (state[s[0:j]], word).

(In my program, the state[] table is represented simply as a vector of integers: seglen[i] is simply the length of the last word of state[s[0:i]], or 0 if s[0:i] is not present in the finite map. This is sufficient to efficiently reconstruct a segmentation.)

This is completely different from your formulation where you're thinking about i+1, i+2, etc. It's not surprising that you think that your formulation isn't dynamic programming!

Now, this is a solution by forward induction or "bottom-up dynamic programming", and I wrote it that way because it's easier to see the mapping to dynamic programming. But if you solve the problem by backward induction or "top-down dynamic programming" instead, you may be able to solve the problem a lot more efficiently, because you can avoid computing most of the table entries. And that's what happens if you just write the recurrence out directly as a recursive function and then memoize it.

1 more reply

anonymoushn14y ago

Just to clarify, would you say that you cannot solve optimal matrix chain multiplication using dynamic programming, because each state depends on a non-constant number of other states? If so, what is the proper name for the commonly known algorithm used to solve optimal matrix chain multiplication, which is described in CLRS as "dynamic programming?"

1 more reply

eru14y ago

At university we defined dynamic programming as reducing a problem to finding paths in directed acyclic grahp. By that definition, there's a simple way to map the problem to dynamic programming. And yes, it's sloppy, if you don't mention the `stages' or equivalently, how you'd (lazily) construct the graph.

1 more reply

anonymoushn14y ago

The article actually includes a solution that uses memoization, which is equivalent to DP.

ominous_prime14y ago

I wouldn't say "is equivalent to"; rather "is a form of". While it is a dynamic programming technique, unless it's specified people tend to think of the bottom-up approach to be what's implied by "dynamic", with memoization being a special case. Yes, it's semantics, but I think that where the GP was coming from.

NY_Entrepreneur14y ago

Yes, I can go along with the claim that the article uses memoization. But that 'memoization' is "equivalent to DP" is not correct, not even close to correct, really is just nonsense. What dynamic programming is has been solid, clear, and fixed all the way back to Bellman, and memoization just is not the same thing at all.

Yes, a memoization solution to the segmentation problem might work in a loop on i for i = n, n - 1, ..., 1, and the usual backward recurrence in dynamic programming does also, but that similarity does not mean that the loop is dynamic programming.

For more, dynamic programming has stages, states, state transition functions, and a basic recurrence relationship. The recurrence does the work at stage i just from the work at stage i + 1; the string problem does NOT do this. In the case of uncertainty, we also need essentially a Markov assumption,

Again, the string problem is cute, and there is at least one cute solution, but it's not dynamic programming.

j / k navigate · click thread line to collapse

117 comments

georgemcbay14y ago

patio1114y ago

Maybe I'm just getting cynical in my old age, but sometimes I think the world is awash in incompetence. We see so much of it in tech because our incompetencies are (marginally) harder to hide.

dxbydt14y ago

>sometimes I think the world is awash in incompetence. We see so much of it in tech because our incompetencies are (marginally) harder to hide.

nasmorn14y ago

I second that. For musicians it is hard to hide their incompetence, for programmers it is just a matter of picking the right succession of jobs.

onemoreact14y ago

PS: Even with backtracking you can code an O(n) solution for vary large input sets. Just keep track of the K best output sequences that are aka N, N-1,N-2,…N-K.

arethuza14y ago

Many years ago ('97) we had a Java programming test for new hires that included a question that involved iterating over a Hashtable containing String keys and values and printing them all out.

Nobody ever answered that question correctly and I used to joke that if anyone ever did we'd hire them immediately.

NB I've long since stopped using those kind of trivia questions in interviewing, but I didn't know any better at the time.

StavrosK14y ago

I'm sorry, my Java is rusty; is this the equivalent of the Python

    print("\n".join("%s: %s" % (key, value) for key, value in mydict.items()))

? If so, is it hard to do in Java?

nitrogen14y ago

Okay, I couldn't resist looking up the old docs, so this looks about right (Enumeration instead of Iterator -- two interfaces that do basically the same thing):

  void printTable(Hashtable table)
  {
    Enumeration e = table.keys();
    while(e.hasMoreElements()) {
      String key = (String)e.nextElement();
      String value = (String)table.get(key);
      System.out.println(key + ": " + value);
    }
  }

3 more replies

SoftwareMaven14y ago

3 more replies

arethuza14y ago

nopassrecover14y ago

Iterating over a hashtable (at least in pseudocode if they didn't recall the API functions off-hand) hardly seems a trivia question...

pbh14y ago

Norvig gives a very approachable version of English word segmentation that uses a language model below.

http://norvig.com/ngrams/

dtunkelangOP14y ago

JL201014y ago

I had a similar question with a twist asked of me during an interview. It went something like this:

I thought it was a great question :)

tzs14y ago

VinyleEm14y ago

dtunkelangOP14y ago

Actually, my original implementation used Dijkstra's algorithm. But I think the linear table of suffixes (or prefixes if you prefer) is a cleaner approach.

StavrosK14y ago

Only slightly joking:

    re.findall("(your|dict|here)", "yourword")

I like the idea of constructing a state machine to do all the matching.

skrebbel14y ago

I love it. You spark an (imho) much more interesting debate: that of raw smartness vs pragmatic productivity.

They ended up hiring her for other reasons (notably, references from trusted colleagues who had worked with her), but I still wonder whether that attitude worked for her or against her.

someone1314y ago

nandemo14y ago

I keep seeing this argument on threads about interview questions.

2 more replies

skrebbel14y ago

I'd hire the him. I have the impression most Google-interviewers (and similar dudes/dudettes) wouldn't. I wonder why.

2 more replies

rfurmani14y ago

bzzt wrong.

>>> re.findall('(a|aa|aaa|ab)', 'aaab') ['a', 'a', 'a']

StavrosK14y ago

Of course, findall is a dumb version that only finds disjoint substrings.

qntm14y ago

kevinastock14y ago

From the problem definition:

Q: What if there are multiple valid segmentations? A: Just return any valid segmentation if there is one.

reidrac14y ago

I think ['a', 'a', 'ab'] is also valid, isn't it? So ['aa', 'ab'] would be one of the correct answers.

(of course ['a', 'a', 'a'] it's not because the 'b' is not a valid word and it's not in the solutions)

rfurmani14y ago

Ah right, I was a bit too quick to challenge. Dictionary: aa, aaa, aaaa, ab Word: aaaab

kragen14y ago

qntm14y ago

This was actually going to be my solution as well. Regex engines are optimised for exactly this sort of problem, why duplicate the work manually?

saucetenuto14y ago

rfumani gave one reason: it won't produce the right answer. Here's another: your regex engine will probably fall over dead if you ask it apply a million-word alternation.

eru14y ago

grep could be fine. I wouldn't trust backtracking regex engines, though.

pjscott14y ago

Let's try it:

    >>> re.compile('|'.join(open('/usr/share/dict/words').read().split()), re.I)
    OverflowError: regular expression code size limit exceeded

For smaller dicts, it should work fine, but it evidently doesn't do so well on larger ones.

nandemo14y ago

Just for fun I decided to rewrite his first version in Haskell. This is probably not idiomatic, though.

  segment_string :: String -> Set String -> Maybe String
  segment_string [] _ = Nothing
  segment_string str dict =
    if str `member` dict
    then Just str
    else let pairs = zip (inits str) (tails str)
             pairInDict (x, y) = x `member` dict && y `member` dict in
         do (x, y) <- find pairInDict pairs
   　        Just (x ++ " " ++ y)

shangaslammi14y ago

Here's my Haskell attempt (both naive recursive backtrack and dynamic programming version):

    type Dict = Set String

    hasWord :: Dict -> String -> Bool
    hasWord = flip Set.member

    fmtWords :: [[String]] -> Maybe String
    fmtWords = fmap (intercalate " ") . listToMaybe 

    splitWordsSimple :: Dict -> String -> Maybe String
    splitWordsSimple dct = fmtWords . go where
        go [] = return []
        go s = do
            (i,t) <- zip (inits s) (tails s)
            guard $ hasWord dct i
            (i:) <$> go t

    splitWordsDP :: Dict -> String -> Maybe String
    splitWordsDP dct s = fmtWords $ a ! 0 where
        a = listArray (0, len) $ map f [0..len-1] ++ [[[]]]
        len = length s
        f i = do
            l <- [1..len-i]
            let w = take l $ drop i s
            guard $ hasWord dct w
            (w:) <$> a ! (i+l)

ottbot14y ago

http://www.codinghorror.com/blog/2007/02/fizzbuzz-the-progra...

nandemo14y ago

So what? This problem (the full version) is more complex than FizzBuzz. Besides, I'm not pasting code here to prove that I can program, but to show how the solution looks like in Haskell.

eru14y ago

Yes, doesn't strike me as idiomatic. But I can't come up with a better version on the spot.

zohebv14y ago

How about this, this will compute more than pairs

let dict = Data.Set.fromList["apple", "pie", "bread", "applepie", "piebread"]

let fn q = concat.Data.List.map (\(x,y) -> if (member x dict) then if y == "" then [[x]] else Data.List.map (x:) (fn y) else [] ) $ tail $ zip (inits q) (tails q)

fn "applepiebread" [["apple","pie","bread"],["apple","piebread"],["applepie","bread"]]

</code></pre>

wensing14y ago

Humbling way to start the work week. I could produce the fizzbuzz solution and in my sharper days the recursive backtracing one, but definitely no further.

SoftwareMaven14y ago

(Yes, not a perfect analogy, but it hopefully gives the idea.)

kragen14y ago

> The advanced answers to this question require spending a lot of time understanding string processing.

(Actually, if you've heard of memoization but not dynamic programming, you might invent dynamic programming in response to this question.)

onemoreact14y ago

Counterpoint: @Retric congratulations, you’ve reinvented dynamic programming with your O(n) solution That’d be a perfect solution.

mason5514y ago

The advanced answers to this question require spending a lot of time understanding string processing.

Really? Seems like a standard DP question to me

btilly14y ago

1 more reply

pkteison14y ago

dtunkelangOP14y ago

As for feeding them page rank, I don't think I have so much to offer that it helps them materially.

logic14y ago

> interviewees are already more aware of Glassdoor than interviewers

My first amusing thought when reading this was to assume a correlation; ie. that interviewees who lean heavily on Glassdoor prior to interviewing do not eventually become intervewers.

mynegation14y ago

Quick heads up: page is not rendered properly in Mobile Safari on iPhone: fixed width font lines are cut off.

dtunkelangOP14y ago

Sorry about that. Advice on how to fix it while maintaining good rendering elsewhere? Looks good in Chrome on my MacBook and my Nexus One.

pavel_lishin14y ago

It's not rendering well for me, either: http://i.imgur.com/Q89gP.png

I don't really have a good solution for what to do about the actual code, though. :(

svdad14y ago

The person who really has _never_ seen anything like this in his life, and still can give a good answer, I have yet to meet.

Shenglong14y ago

With the exception of the last part of the question, you learn everything there in your first year of CS at university. Do people who can't write this really put the language on their resume?

dtunkelangOP14y ago

cema14y ago

Yes, it's basic. But many people who are not fresh out of college may have spent recent years solving a completely different set of programming tasks, and do not have it loaded in their brains.

Shenglong14y ago

You have my sympathy :)

Havoc14y ago

Its a great question. Pretty sure its not nearly as much of a secret as the author thinks though. I've seen a detailed write-up before somewhere (HN?) and I'm not even a programmer by profession.

dtunkelangOP14y ago

Secret is a strong word -- I link to another post on the subject in the article. Still, seeing it on Glassdoor for my own employer crossed the line of disclosure.

ohashi14y ago

I read this and went, I had someone build this for me years ago!

mak12014y ago

So even though you are retiring this one, coming up with something similar that tests for basically the same things shouldn't be impossible.

pavel_lishin14y ago

Quick note to the author - holy god, this is annoying: http://i.imgur.com/0cNx4.png

dtunkelangOP14y ago

Sorry. The post renders well in Chrome on my Mac and on my Nexus One. But apparently not so well in other browser / platform combinations.

wisty14y ago

The problem he's having is that good interview questions are getting busted, as people post solutions on the web.

If you have a lot of similar interview questions, then there's no way anyone other than a savant can memorize them without actually learning the theory.

dtunkelangOP14y ago

Point taken. But it's hard to come up with good interview questions, as my colleagues here, at Google, and at Endeca can attest. In contrast, it's much easier to post solutions.

That's why I'm working on an approach that assumes the candidate does of prior knowledge of the problem. But not there yet.

wisty14y ago

jswinghammer14y ago

One thing that I've noticed over the years is that almost no one prepares for an interview in any way so you'll still keep out the worst candidates with this question.

cema14y ago

I think Google etc are exceptions, people do prepare for technical interviews there.

NY_Entrepreneur14y ago

So, first-cut, there is no dynamic programming solution. For a second-cut, there might be a dynamic programming solution if the author would outline one!

There is now some question if the Google interviewers really understand dynamic programming!

rfurmani14y ago

NY_Entrepreneur14y ago

Also for your approach, you are not working just at stage i but also with [j, i) for 1 <= j < i which is again a break with dynamic programming where we regard each i = 1, 2, ..., n as a stage.

It's a cute string exercise with some cute solutions, but it's just not dynamic programming.

rfurmani14y ago

1 more reply

kragen14y ago

This is completely different from your formulation where you're thinking about i+1, i+2, etc. It's not surprising that you think that your formulation isn't dynamic programming!

1 more reply

anonymoushn14y ago

1 more reply

eru14y ago

1 more reply

anonymoushn14y ago

The article actually includes a solution that uses memoization, which is equivalent to DP.

ominous_prime14y ago

NY_Entrepreneur14y ago

Again, the string problem is cute, and there is at least one cute solution, but it's not dynamic programming.

j / k navigate · click thread line to collapse