Worse yet, even if you did come up with this problem on your own, this exact problem was a fairly common interview question back in the mid 90s, when string processing interview questions were all the rage for C/C++ programming jobs -- I must have seen it a dozen times in various interviews over the years. The problem is familiar enough that I'd bet a decent amount of money that it must be listed in one of those interview problem books that were popular before the websites for this stuff started showing up over the past couple years... If you really relied on the interviewee having no possible advance knowledge of this question prior to the interview you surely had a false sense of security prior to seeing it appear on glassdoor.
As long as you engage the interviewee to see that s/he really understands the answers they are giving, I don't really see why it matters if the question has appeared on one of these sites. If the interviewee is preparing for their interviews enough that they are actually looking at these sites and understanding the answers to the point where they can intelligently discuss them, that probably already puts them in the upper tier of people you'd want to seriously consider hiring, so retiring the question is probably counter-productive unless you have a non-disclosed alternative that you're sure is as good of a question.
Maybe I'm just getting cynical in my old age, but sometimes I think the world is awash in incompetence. We see so much of it in tech because our incompetencies are (marginally) harder to hide.
A very candid Philip Greenspun in Founders@Work( p341-2) : "In aviation, by the time someone's your employee, their perceived skill and actual skill are reasonably in line. JFK Jr is not working at a charter operation because he's dead. In IT,you have people who think 'I am a great programmer'. But where are the metrics that prove them wrong ? Traffic accidents are very infrequent, so they don't get the feedback that they are a terrible driver.
Programmers walk around with a huge overestimate of their capabilities. That's why a lot of them are very bitter. They sit around stewing at their desks. That's why I don't miss IT, because programmers are very unlikable people. They are not pleasant to manage."
So, assuming a real world input and real world dictionary you can try plenty of things that break down for a dictionary that includes four hundred A’s but are actually valid solutions. Also, if you want a fast real world solution then sticking with a pure lookup dictionary would slow things down. Being able to toss 5 letters into a lookup table that says the longest and shortest words that start with those letters would save a lot of time. Basically, ‘xxyzy’ = null, null saves you from looking up x to the maximum word length in your dictionary. Secondly sanitizing inputs is going to be both safer and faster. And anything you are coding in 45 minutes is going to be a long way from production ready code.
PS: Even with backtracking you can code an O(n) solution for vary large input sets. Just keep track of the K best output sequences that are aka N, N-1,N-2,…N-K.
Nobody ever answered that question correctly and I used to joke that if anyone ever did we'd hire them immediately.
NB I've long since stopped using those kind of trivia questions in interviewing, but I didn't know any better at the time.
print("\n".join("%s: %s" % (key, value) for key, value in mydict.items()))
? If so, is it hard to do in Java?Norvig gives a very approachable version of English word segmentation that uses a language model below.
Given a list of all the short strings in the periodic table of elements (e.g. Na, F, Al, etc) and a list of all the words in the English language: 1) write a method that finds the longest possible English word you can spell given any combination of the strings in the periodic table of elements. Re-usage of elements in the same string are allowed. 2) Describe what kind of data types you would want for the two lists and describe anything special about them. 3) Give a big O estimation.
I thought it was a great question :)
However, I would have been able to come up with an alternative O(n^2) solution, involving building a directed graph with vertices representing each position in the string and the end of the string, with edges connecting two vertices if there is a dictionary word starting at the position corresponding to one vertex and ending just before the position corresponding to the second vertex. This can be done in O(n^2), and then you can find the shortest path from the vertex for 0 to the end vertex on this graph in O(n^2) (e.g., by Dijkstra's algorithm), and that gives you an exact covering of the string using the minimal number of dictionary words.
EDIT: A naive implementation of your idea can take upto O(n3m) time where m is the no. of words in dictionary. Using a fancy data structure like a trie or suffix tree or an automaton can speed it upto O(n^2). Can we better that?
re.findall("(your|dict|here)", "yourword")
I like the idea of constructing a state machine to do all the matching.A colleague of mine (now hired) was asked to code a string compare during her interview for a .NET function. She said "String.Compare()". This puzzled the interviewers for a while. They asked her whether she could write it out, she said she didn't see the point.
They ended up hiring her for other reasons (notably, references from trusted colleagues who had worked with her), but I still wonder whether that attitude worked for her or against her.
>>> re.findall('(a|aa|aaa|ab)', 'aaab') ['a', 'a', 'a']
The correct answer would be ['aa', 'ab'] but unfortunately findall works greedily and so will not find the optimal solution. It is possible to specify it as a regex, but common implementations might take too much time to come up with the good solution.
I'm missing something, clearly, but my initial idea was a state machine that contained all the words (sorted by length, perhaps, but that's not necessary if we don't care about the solution with the "most words") and linked back to itself. It's essentially the same as the backtracking example, from a cursory glance.
Of course, findall is a dumb version that only finds disjoint substrings.
EDIT: It turns out that the group count in regexps is fixed, so it's not possible to return all matches of a single group, even if it's repeated. All my solution above does is show you whether there's a match or not, but not what it is (unless it's the single word).
Q: What if there are multiple valid segmentations? A: Just return any valid segmentation if there is one.
(of course ['a', 'a', 'a'] it's not because the 'b' is not a valid word and it's not in the solutions)
>>> re.compile('|'.join(open('/usr/share/dict/words').read().split()), re.I)
OverflowError: regular expression code size limit exceeded
For smaller dicts, it should work fine, but it evidently doesn't do so well on larger ones. segment_string :: String -> Set String -> Maybe String
segment_string [] _ = Nothing
segment_string str dict =
if str `member` dict
then Just str
else let pairs = zip (inits str) (tails str)
pairInDict (x, y) = x `member` dict && y `member` dict in
do (x, y) <- find pairInDict pairs
Just (x ++ " " ++ y) type Dict = Set String
hasWord :: Dict -> String -> Bool
hasWord = flip Set.member
fmtWords :: [[String]] -> Maybe String
fmtWords = fmap (intercalate " ") . listToMaybe
splitWordsSimple :: Dict -> String -> Maybe String
splitWordsSimple dct = fmtWords . go where
go [] = return []
go s = do
(i,t) <- zip (inits s) (tails s)
guard $ hasWord dct i
(i:) <$> go t
splitWordsDP :: Dict -> String -> Maybe String
splitWordsDP dct s = fmtWords $ a ! 0 where
a = listArray (0, len) $ map f [0..len-1] ++ [[[]]]
len = length s
f i = do
l <- [1..len-i]
let w = take l $ drop i s
guard $ hasWord dct w
(w:) <$> a ! (i+l)<pre><code>
let dict = Data.Set.fromList["apple", "pie", "bread", "applepie", "piebread"]
let fn q = concat.Data.List.map (\(x,y) -> if (member x dict) then if y == "" then [[x]] else Data.List.map (x:) (fn y) else [] ) $ tail $ zip (inits q) (tails q)
fn "applepiebread" [["apple","pie","bread"],["apple","piebread"],["applepie","bread"]]
</code></pre>
(Yes, not a perfect analogy, but it hopefully gives the idea.)
No, the advanced answers are a simple application of dynamic programming. If you've never heard of dynamic programming before, you're unlikely to invent it in response to an interview question, of course; but if you have heard of it, it might occur to you to try it on this problem.
(Actually, if you've heard of memoization but not dynamic programming, you might invent dynamic programming in response to this question.)
I think this is at the opposite end of the spectrum from your CSS example. Dynamic programming has nothing to do with string processing or with any other particular domain. There's a list of 29 significant algorithms that apply it at http://en.wikipedia.org/wiki/Dynamic_programming#Algorithms_.... It might qualify as "deep knowledge", but it's not deep domain knowledge; it's the kind of deep knowledge that would make you want to hire someone from a different domain.
Really? Seems like a standard DP question to me
As for feeding them page rank, I don't think I have so much to offer that it helps them materially.
My first amusing thought when reading this was to assume a correlation; ie. that interviewees who lean heavily on Glassdoor prior to interviewing do not eventually become intervewers.
My suggested solution: don't wrap it in PRE tags with manual line-breaks. It's not code, so why preserve the exact breaks? Try BLOCKQUOTE - I don't know if it's widely supported anymore - or just italicize the whole thing.
I don't really have a good solution for what to do about the actual code, though. :(
But people with that background will give good answers, even if they haven't seen _this specific problem_, because they have seen lots of problems like it and recognize the pattern. And even in that case, we evaluate them based on how well they can implement the pattern they saw, not just on whether they recognized the correct algorithm. So what if they've seen this problem already? Coding it up efficiently and elegantly in an interview context is still non-trivial, and you can still push them to discuss edge cases and performance tradeoffs.
The person who really has _never_ seen anything like this in his life, and still can give a good answer, I have yet to meet.
Can I get some stats? I really don't (want to) believe it. What percentage of people get this question wrong? Are they all some sort of eng/cs graduate? I'm not even a coder and I can solve this in a few minutes.
When applicants prepare for an interview, they do not often know what kind of knowledge to load in their heads. For example, just a couple of weeks ago I was asked to figure out a simple bit-flipping scheme, and bit string manipulations are something that I have not thought about in many years. So it took me about 10 minutes for a problem that I would have done in less than a minute when I was spending time thinking about similar things and my mind was full with them.
Being prepared for a technical interview does not mean to have memorized a few solutions to a few problems, but it means to have played with them sufficiently to have the brain loaded with the material. This helps with intuition as well as specific technical skills.
So I looked at the code, it used the efficient solution. Now I am even more impressed with the programmer. I've always thought highly of him (better than me) but it's hard to evaluate someone better than you. His solution ran circles around mine (I had general simple case with 2 words) and now I know exactly how much more efficient his solution was. Very neat.
So even though you are retiring this one, coming up with something similar that tests for basically the same things shouldn't be impossible.
If you have a lot of similar interview questions, then there's no way anyone other than a savant can memorize them without actually learning the theory.
That's why I'm working on an approach that assumes the candidate does of prior knowledge of the problem. But not there yet.
An outline usually includes at least the definition of the 'stages' of the dynamic programming solution. For the problem of 'string segmentation', the obvious selection of stages would be each of i = 1, 2, ..., n for the given string of length n. But this definition of the stages does not yield a dynamic program because the solution at stage i needs more than just the solution at stage i + 1 and, indeed, potentially needs the solutions at each of stages i + 1, i + 2, ..., n.
So, first-cut, there is no dynamic programming solution. For a second-cut, there might be a dynamic programming solution if the author would outline one!
There is now some question if the Google interviewers really understand dynamic programming!
There may be a problem with what you outline: That substring [i, j) is a word is not so impressive! In addition we need to know, for i < p < j < q, is [p, q] a word. That is, even if [i,j) is a word, we can't conclude yet that it will be a word in the final, feasible 'segmentation' of the whole string. Indeed, even if [j, n] 'segments', we can't yet conclude that those segments will be in the final, feasible segmentation of the whole string [1, n].
Maybe in the core of the work what we want to know is, for each i = n, n - 1, ..., 1, is there a 'segmentation' of [i,n]. To tell this, at i, consider j so that i < j <= n and ask if [i, j) is a word AND at the same time if [j,n] can be 'segmented'. If so, then say that [i,n] can be segmented. So, in this, first we ask if [j,n] can be segmented since we have already addressed that issue and recorded the result, true or false.
So, the break with dynamic programming is that at stage i, we have to look at the results of not just stage i + 1 but at the results of each j for j = i + 1, i + 2, ..., n, which breaks the basic 'recursion' that is at the core of dynamic programming. That is, a key feature of dynamic programming is that at stage i, to find what to do, need only the 'state' at stage i and, for each possible state at stage i + 1, what to do at stage i + 1. That is, at stage i, do not have to look at stages i + 2, i + 3, etc.
Also for your approach, you are not working just at stage i but also with [j, i) for 1 <= j < i which is again a break with dynamic programming where we regard each i = 1, 2, ..., n as a stage.
No matter what some common practice is, just saying dynamic programming without being explicit about the stages, states, state transition functions, and the recurrence is sloppy to the point of risking often being wrong. This is not nearly the first time that computing took something solid and precise from applied math and made a mess. Messes, even if common, are not good. You will see a lot of care and discipline in Dreyfus and Law along with essentially all the applied math literature on dynamic programming, back to Nemhauser, Bellman, and others.
It's a cute string exercise with some cute solutions, but it's just not dynamic programming.
Yes, a memoization solution to the segmentation problem might work in a loop on i for i = n, n - 1, ..., 1, and the usual backward recurrence in dynamic programming does also, but that similarity does not mean that the loop is dynamic programming.
For more, dynamic programming has stages, states, state transition functions, and a basic recurrence relationship. The recurrence does the work at stage i just from the work at stage i + 1; the string problem does NOT do this. In the case of uncertainty, we also need essentially a Markov assumption,
Again, the string problem is cute, and there is at least one cute solution, but it's not dynamic programming.