My favorite coding question to give candidates (opens in new tab)

(carloarg02.medium.com)

138 pointsdavidst2y ago222 comments

222 comments

Asking candidates to come up with this kind of solution in an interview setting where they are under all kinds of pressure is honestly dehumanizing.

There's a lot of good insight in the article about the correct way to approach the problem, but asking anyone to come up with it on the spot is unrealistic. You have the benefit of having seen the problem before with time on your side to reflect on it. They haven't.

When I do interviews like this, I prefer to talk them through the problem together, like we were actual teammates working on a problem together. That more closely relates to life on the job, which to me is the point of interviewing someone.

dnsco2y ago

Many people (especially from big tech backgrounds), treat interviews as "the time for the candidate to prove that they are good enough to work at my company". I, like you, prefer to use the time for collaborative problem solving to try and get as much signal as possible about whether it would be fruitful for us to work together, while also trying to figure out if we would want to work together.

The "is this person good enough for me" interview allows geniuses who are assholes through. I prefer to filter for good teammates.

cperciva2y ago

If this is considered dehumanizing, I'm concerned for the future of our industry. Anyone with even a first undergraduate algorithms course should immediately spit out "sort by customer and page, then stream the output".

I wouldn't expect everyone to immediately see the optimizations -- you can sort the days individually, you can drop all but a small number of pages per customer -- immediately, but I would be disappointed if someone couldn't be hinted there by the end of a 30 minute interview.

Failing to even get the O(n log n) solution tells me that someone should never have graduated.

arp2422y ago

> If this is considered dehumanizing, I'm concerned for the future of our industry. Anyone with even a first undergraduate algorithms course should immediately spit out "sort by customer and page, then stream the output".

Well yes, but that's not what this is about. It's about social pressure, not ability.

I could program this for you right now if I wanted to; most "optimisations" seemed the "obvious ones" to me, so I guess I would have passed. I could also program it in a hurry at ludicrous speed under pressure because all of production is down or whatever.

But ... I wouldn't be able to program this in an interview setting.

Last time I was asked a bunch of even easier things (in a timed interview, no less), where I literally have some exact code on my GitHub. I just couldn't finish it. Not because I can't "get" it, but because programming while two strangers are judging your every keystroke and commenting on what you're doing is just nerve-wrecking.

Or, basically this sketch: https://www.youtube.com/watch?v=kQa5NsdYSts

thaumasiotes2y ago

> Failing to even get the O(n log n) solution tells me that someone should never have graduated.

This is something I wonder about in my comment sidethread - to me, the natural solution is the O(n) one in O(n) space.

I see that the O(n log n) solution requires O(1) space to run. But it requires O(n) space to return a result! Is that actually better than O(n) time and O(n) space?

2 more replies

SOLAR_FIELDS2y ago

I have a non traditional background (non comp sci) and while I understand complexity I wouldn’t have immediately arrived at the solution like this.

I think that rather, I would have done well because I know after doing hundreds of these both as the interviewer and the interviewee that using some sort of Map along with some vague rumblings of “space/time complexity tradeoff” is very often the solution for these types of questions. So I would have immediately gone there first by instinct.

bfung2y ago

The question asked doesn’t involved any computer science algorithm knowledge at all, nowhere near leet code complexity.

Only the basics that are close to everyday programming work: write a for-loop, know what a Map/dict is, and Google for “how to read a file”. If a candidate can’t do that, they can’t really program. ChatGPT can probably code this answer.

sfn422y ago

And now you know who the devs are who think LLMs will replace us. They're the ones who think this question is too much to ask.

munksbeer2y ago

Agree. Our interview technique is a trimmed down but somewhat realistic coding exercise. We do it as a pair programming exercise and we tell them up-front that this should be an interactive and positive experience, we're not trying to trick them or put them under pressure. It'll test the candidates ability to actually do the job they'll be doing if we hire them, which is to write code in an IDE. We do it with them, so they can ask questions on libraries they're not familiar with and so on. Obviously we're not going to solve the problems in exercise for them, but if they do get stuck, we'll help move them forward.

The coding exercise will involve writing tests, deisgn of classes, working with floating point arithmetic, working across thread barriers, some fairly standard maths, and so on.

I actually can't say we're hired a bad candidate yet and they all report that the interview is a positive experience.

In my fairly long career I've been through about 10-15 interview processes. A few of them were quite toxic and those were the "just cram leetcode for months" type. I never want to inflict that on someone else.

weaksauce2y ago

While I like your approach better than his by a considerable margin... I would say his approach has merits since he's a principle engineer at google(maybe hiring there) and was an engineer hiring at amazon. the problems they face would definitely have this applicability of a large volume of data needing to be worked on efficiently.

1 more reply

jemfinch2y ago

> I prefer to talk them through the problem together, like we were actual teammates working on a problem together.

If a colleague came to me to discuss a question this basic, I'd want them PIPed out. This is miles from the complexity of problems that require collaboration. It's a shell pipeline that I'd write with sort/uniq in less than five minutes. I just did it to make sure my estimate wasn't wrong:

    sort <(sort -u <(cut -d' ' -f1,2 /tmp/day1)  <(cut -d' ' -f1,2 /tmp/day2) | cut -d' ' -f1 | uniq -d) <(sort <(cut -d' ' -f1 /tmp/day1 | sort | uniq) <(cut -d' ' -f1 /tmp/day2 | sort | uniq) | uniq -d) | uniq -d

Maintainable? No. Done iteratively in less than five minutes? Yes.

If your perspective is that it's unrealistic to ask anyone to come up with an answer to this question on the spot, you should take a long, hard look at yourself and the quality of your colleagues.

DoesntMatter222y ago

As mentioned over and over. This isn't about the ability to do it. This is about making it less of a hazing ritual that generally only seeks to make the interviewer have a sense of importance.

nsxwolf2y ago

If people knew that people like you existed, they’d be terrified of ever asking for help on anything.

AstralStorm2y ago

This thing runs for 2 days not total scan though. It does not scale to a total scan due to an O(n) memory requirements of storing every hit.

You didn't ask the question you were supposed to ask. :)

aaomidi2y ago

I would seriously expect anyone above mid level to not even consider the brute force solution.

DoesntMatter222y ago

Sadly I know plenty of Seniors with 15 or so years of experience. That would write brute force and it still wouldn't work right.

Senior in a perfect world sure but the reality is that for decades ppl have been desperate for talent and so you don't have to do much to get by.

I knew a popular principle engineer at a fortune 50 company who was loved by execs. He would check in 5 or 10 of the same file. Opposite of the dry principle.

mock-possum2y ago

I don’t about that, I mean - if you’re going to be in a position where this level of craftsmanship is necessary, then why shouldn’t the interviewer require that you demonstrate your ability to do so? It’s not a trick question, it isn’t unrealistic or impractical, it just spotlights a particular programming focus, performance.

corethree2y ago

This one is pretty easy though.

zzyzxd2y ago

> After a candidate puts forth the O(n²), I smile politely and I wait. I am really hoping the next words that come out of their mouth are “…but the complexity of this is O(n²) so can I do better?”

> Occasionally, a candidate will think they’re done at this point. 90% of the times that a candidate was done without questioning the quadratic nature of that solution, the final outcome of the loop was No Hire. So that’s another signal for me.

I would have been one of such candidates. The author said they didn't like tricky questions and wanted to get a signal on how the candidate may approach real world problems. Well this is indeed tricky -- unless you drop a bunch of constraints in the beginning, for a real world project, I would just use all the resources I can access to finish it. I am not going to go the extra miles to optimize it in all possible ways. Premature optimization can be evil. I provided the solution, it works and meets all your requirements, then I am done.

Want me to make it fast/memory efficient? You have to say it. Forgot to mention it in the first iteration? No problem, cut me a ticket and I will see if I can sneak it into my next sprint.

xen02y ago

Describing 'throwing a hashmap' at the problem as 'premature optimisation' is a bit reductionist.

It wouldn't even occur to me to go with the naive O(n^2) solution because it has such obvious drawbacks with large inputs.

And it's an interview question... yes, you're getting a No-Hire if you just leave it there. Although I personally would prompt you if you're not interviewing for a senior position.

didntcheck2y ago

Yep. I'm surprised anyone hears a problem like this without immediately thinking of maps and sets. I was expecting the naive/"poor candidate" solution to just be memory-inefficient uses of those data structures. The O(n^2) solution is something that you might do when you're just starting to learn programming, but anyone ready to apply for jobs should see that it's not only extremely inefficient, but IMO requires more cognitive effort than just throwing the data into maps and letting the standard library do the heavy lifting (even if that is still not optimal). This is basically an Algo 101 "what not to do" introductory example

2 more replies

marcos1002y ago

Exactly. At least you have to show that you know what you're doing and it's deliberate. Depending on the seniority, I expect some kind of justification in the comments, like "it's O(nˆ2), but since the input will be very small, it's ok".

In real life people do a lot of O(nˆ2) code without realizing, and usually it's just some unnecessary loop inside another loop. I want to know that you care about some things.

zzyzxd2y ago

I have grinded leetcode and worked at FAANG, so I understand the common signals interviewers look for. But that's only because I spent a lot of time on FAANG style interview prep. I don't know why the interviewer had to give that awkward silence -- The candidate, who is likely under abnormal pressure, may start thinking about a bunch of other things "is there a bug in my code? did I miss some edge case?" I am one of those people who become very nervous in a live coding interview and usually can't perform as good as I do in my daily work. There was one interview when I embarrassingly forgot some basic syntax of a language that I used every day for years.

I don't understand why this has to be a red flag. What's wrong with just saying "it works, but can you make it faster"? As an interviewer, I say this a lot and I never judge candidates on this at all.

TheCoelacanth2y ago

Not even large inputs. More like medium or even the larger end of small. A quadratic-time solution can often start to be noticably slow with just a few thousand items.

A realistic dataset for the problem they descibed could easily be tens of millions of records even if you're not a Google-sized site.

philwelch2y ago

With an attitude like that, I think the author would be absolutely justified in not wanting to hire or work with you. This is the sort of attitude you’d expect from a disaffected minimum wage teenager, not from anyone who took pride in their work or cared about the quality of what they delivered. There are certain things that, even if they aren’t explicitly stated as requirements from the outset, should be reasonably assumed. Avoiding quadratic algorithms is one of those. That’s not what “premature optimization” means; optimization is about tweaking something after the fact but choosing an efficient algorithm is a decision you make at the very beginning.

> Want me to make it fast/memory efficient? You have to say it. Forgot to mention it in the first iteration? No problem, cut me a ticket and I will see if I can sneak it into my next sprint.

It’s probably a lot easier to just not hire people who deliver poor quality work in the first place. Then you don’t have to worry about whether or not they can go back and fix it later.

PH95VuimJjqBqy2y ago

absolutely not, in the real world you work with constraints and you can make simplifying assumptions.

"How large can the file potentially be" would be my first question. Depending on the size I might even throw sqlite, an RDBMS, or even a full text search engine at the problem. Or I might not, it depends on the actual scenario.

But if your description is to keep it simple I'm going to do that in good faith.

pavel_lishin2y ago

> Want me to make it fast/memory efficient? You have to say it.

Me: Child, please go wash your hands.

Child: rinses hands under cold running water for 2 seconds

Me: With soap! And warm water! For more than two seconds!

Child: You want me to wash my hands with soap? You have to say it.

thaumasiotes2y ago

If it's any consolation, the running water is more valuable than the soap. Sadly, time under the running water is a major factor in how much stuff you can dislodge from your hands, so two seconds isn't going to accomplish a lot.

The soap is independently helpful, but it does occur to me to wonder how much of the additional value it brings is due to the automatic requirement to spend more time with your hands in the water, lathering up and rinsing off.

2 more replies

avmich2y ago

Parent: Child, please go wash your hands.

Me: Should I wash both hands?

Parent: Yes.

Me: Should I wash them in any order or the order is important?

Parent: In any order.

Me: Should I wash them fast or thoroughly? I need to ask as that may affect the way I'm solving this interview problem.

1 more reply

hoseja2y ago

You shouldn't be using O(n^2) algos at all, ever, unless the problem is well-constrained to be very small or there is no other possible solution.

They fester and/or blow up regularly otherwise.

professoretc2y ago

I've heard O(n²) described as the most dangerous asymptotic complexity, because it seems fine for small testing inputs but falls down when you throw real-world -sized data at it.

nsxwolf2y ago

This will never matter in, say, a site configuration dashboard where a set of options is generated quadratically from a couple small sets of data.

There are some features that will never be used at any significant scale.

huehehue2y ago

I generally agree with this, as an interviewer. You need to be explicit about what you're screening for, unless you're trying to hire a mind reader.

E.g. "this is a one-off report" and "ok, how would you do it differently if we wanted to turn this into a feature at scale?"

It's great if an engineer gets there on their own, but there are so many tangents one could go on that I won't ding someone for picking the wrong one, or asking for the most relevant one.

I would expect seniors not to need that level of guidance on the job, with the caveat that expectations should be reasonably set in design docs, tickets, various bits of context, etc.

nemetroid2y ago

This is how we end up with https://accidentallyquadratic.tumblr.com/.

phito2y ago

Yep definitely No Hire.

GoblinSlayer2y ago

It's a google interviewer, he can have any hiring practice, he will still hire someone.

clnq2y ago

> No great engineer should ever settle for an O(n²) algorithm, unless bound by memory or some other unmovable constraint.

What if this is a one-off to produce a business report? Would it make sense to use programmer time to create an O(n) structure in memory, or just loop through the files line by line and let the CPU take a minute or five, or thirty? What is the programming language - something that has a library for this or something very low level where we’d read the file byte by byte?

If we’re dealing with the latter, a small amount of data, and a one off report, I don’t care at all in my work whether an engineer I’m managing somehow writes it in O(n^3).

It’s interesting how quick to judge the author is - ask this question for points, don’t even think about that, don’t mention arrays because they’re fixed size (despite implementations for dynamically allocated arrays totally existing and the candidate might be coming from that), and so on. Some humility would be nice.

Although I think what they wrote is very valuable, as this is how many interviews go. And I have to at least appreciate the author’s approach for trying to start a conversation, even if he still takes a rather reductive approach to evaluating candidates.

jauntywundrkind2y ago

There's places where writing bad shitty code doesn't matter but frankly I'd rather be at places and rather have colleagues and an environment where we don't write bad shitty code.

The attempts to circumstantially excuse away shitty answers goes against the desire to just not be shit.

The author here seemed able and willing to talk through constraints & issues. Their first paragraphs practically begged for it, as a sign of maturity. Rather than just excuse away shitty solutions, my hope is, even if you are not a super competent can-do coder, you at least can talk through and walk through problems with people that are trustable, to arrive at reasonably competent capable answers.

0xb0565e4862y ago

I would argue that in most cases where performance isn’t a constraint, the first algorithm that comes to mind is probably the most optimal choice. He even says:

> About 80% of the candidates go for the naive solution first. It’s easiest and most natural.

The “naive solution” will be easier to understand and maintain. Why make it harder if it doesn’t add value?

stouset2y ago

The nearly optimal solution is effectively just as easy to understand and maintain and frankly should come even more naturally to an engineer than the O(n^2) version.

But even more importantly, with a slightly better solution I don’t get woken up in the night once a week because some buffoon left behind a hundred of these little performance landmines that worked great when the table had ten rows on their dev box but causes an outage in prod the second we hit some critical threshold of data.

This takes all sorts of forms from people using quadratic time or quadratic memory to my personal favorite: pulling entire database tables across the network to do basic analytics.

The authors always have the same excuse, that “it worked good enough for what was needed at the time”, ignoring the simple fact that the version which wouldn’t have caused an outage would have been just as easy from day one.

1 more reply

PartiallyTyped2y ago

I used a brute force-y approach that meets the requirements, saves millions in operational costs (vs hiring engineers to build and maintain the complex non-brute-force solution).

Unfortunately people don’t think about actual engineering cost of “optimal” solutions. Engineering costs are part of operational costs and need to be juxtaposed against compute.

You can get a lot more mileage out of running the largest EC2 instance for a year vs hiring a junior engineer.

gpderetta2y ago

Why would only the non-bruteforce solution require hiring an engineer to maintain it? Does the brute force solution spontaneously manifest and maintains itself?

1 more reply

gpderetta2y ago

Even you are doing some one off analysis, having to wait several minutes can be annoying. And the non-quadratic solutions are not harder than the quadratic one.

xigoi2y ago

“Temporary” code rarely ends up actually being temporary.

Izkata2y ago

> Can’t I just use a Database?

> In theory you could write a pretty simple SQL query, and sure, Big Tech companies have giant data warehouses where you can easily do this sort of thing. But for the scope of a coding interview, you wouldn’t want to. Since this is not a distributed systems problem and the data fits in memory, why introduce the additional complexity and dependencies of a database for something that you can solve with 20 lines of simple code?

My first thought on an actual implementation was, if this is a one-off request, to import it into sqlite. No need to set up a big system, and I think it would be easier/faster than writing those 20 lines of code. Also a hell of a lot easier to iterate on minor spec tweaks like the unique pages overall vs per day clarification. And probably less likely to have off-by-one type of bugs, since the simple logic is handled by the database itself. Bonus, it does handle the case where the dataset is larger than memory!

ReactiveJelly2y ago

Yep. I estimate if the files are over 10 million lines, I'd rather use SQLite. They'd probably still fit into memory, but that's where I'd put up with the up-front hassle of SQLite so that I don't have to reparse CSV in Python / Lua / whatever I'm writing this script in

mplewis2y ago

Exactly. Use the right tool for the job here.

Foobar85682y ago

Excel and power query, bonus point, you can perform more advanced data analytics and industrialize for a folder easily

amluto2y ago

But a point lost because you may need to go out of your way to avoid hitting row limits. “I’m using a proprietary tool that has serious issues with more than 2^20 rows” is not so awesome.

rf152y ago

> I don’t like hard or tricky questions.

...

> I’ve actually expressed the problem in an ambiguous way.

So when other people do it, hard and tricky questions are bad, but when you deliberately set your candidate up for failure by withholding concrete information, that's clever and insightful. Got it.

Or more productively put: The author obviously enjoys tearing down simple questions with complex implications (one often does in the sw field) and reflects over their candidates, but seemingly lacks the self-reflection to understand what makes questions hard or tricky and why interviewers like to pick them.

mieubrisse2y ago

Viewpoint as someone whose primary job right now is recruiting: my observation on what differentiates Fine software engineers from Great ones is how they understand the problem, engage with it, and go get what they need to solve it. Fine software engineers need to have the problem pre-digested; Great ones can take ambiguous problems, figure out what's missing, and hunt it down.

I don't see the author's withholding of all problem parameters as tricky at all, but an attempt to accurately mimic the ambiguities of the real world to see what the candidate does with it.

rf152y ago

I do interviews for my team, and I'm very aware that framing questions the wrong way quickly leads people into the trappings of their own obsessions. Be that a need for optimising the wrong things (e.g. optimising minute but easy to optimise logic next to large, network-heavy calls), or assuming the wrong perspective on a task (e.g. getting tasked with an accidentally math-heavy problem, and assuming it's about solving the math instead of their overall approach to new requirements) or simply lacking the confidence to question obvious problems in the task they were given. Interviewing is a stressful to downright dehumanising process that is hard to get right, even with years of experience.

PH95VuimJjqBqy2y ago

Your mistake is in thinking you can communicate so perfectly that they haven't decided they have the answer to the question from your description.

hoseja2y ago

I don't see it. Usually the bad tricks are CS puzzle gotchas, not a designed-in, well-spottable, management underspecification, which is actually testing for an everyday-used skill in the actual job.

arp2422y ago

Except it's not a "actual job" setting. It's an interview.

Especially for more junior people it's selecting on confidence, because especially in a hiring setting they don't want to "seem stupid" by asking questions. I guess this is also a problem for more senior people. It's just a different setting than "actual job".

If you want to ask if anything could be clarified then ask.

1 more reply

rewmie2y ago

> So when other people do it, hard and tricky questions are bad, but when you deliberately set your candidate up for failure by withholding concrete information, that's clever and insightful. Got it.

To me the worst part is the nonsense rationale to argue away using a database to store and query this data. Taken from the article:

The blogger's problem statement explicitly mentions the system needs to track business metrics. Business metrics include things like clickstream metrics and the task of analyzing business metrics is exploratory in nature and requires dashboarding. Only an utter moron would look at those requirements and say "hey, this is a task for logs dumped onto a text file". There is no way in hell that this task does not involve in the very least a database.

This is yet another example of a technical recruiter evaluating candidates, and rejecting them, because they did not presented the wrong solution to the wrong problem that the recruiter somehow convinced himself it was the right one for some reason.

redditor986542y ago

I assure you, Carlos is not a “tech recruiter”.

1 more reply

Mawr2y ago

Here we see the most classic interviewing error: not understanding that there's a difference between a test and what's being tested.

Will the data fit in memory? Well, of course it will, it's an interview... oh you expected me to ask you anyway?

I should obviously load both files into the hashmap, that way it works for an arbitrary amount of files instead of just two... oh, you expected me to write a solution for literally the exact problem you stated without considering its practicality? Even though before you wanted the opposite, when you asked about the algorithmic complexity?

Guess I'm failing.

gpderetta2y ago

An interview could reasonably ask for an out-of-core solution.

But you are right that optimizing specifically for two files seems wrong, especially as the data contains a timestamp, so you could simplify the problem and ignore the number of files completely.

xoranth2y ago

You can generalize to

> Now, given two N log files we want to generate a list of ‘loyal customers’ that meet the criteria of: (a) they came on ALL days, and (b) they visited at least L unique pages.

while keeping linear time complexity and

  O(num records in first file * L)

memory complexity. It is not too different from the solution given in the article (just use 3 maps instead of two).

That means that multiple files doesn't need out-of-core if the maps for one file at a time fit memory.

qsort2y ago

The three solutions (n^2 time 1 space, n lg n time k space and n time n space) are basically the three strategies to perform a join in a relational database: a full table scan, a merge-join and a hash-join respectively.

"explain select" is a cool source of interview questions :)

dopidopHN2y ago

Thanks. My reasoning was indeed “alright, it’s a weird world without DB, what a DB do? Hash-join.

AstralStorm2y ago

A better db would keep an online running window in a bloom filter for it, though. (A good implementation of an index.) Not even mentioned in the post.

interactivecode2y ago

You know since this assignment is basically a boss asking for a single report once about "loyal customers" complexity doesn't matter. let's say worst case scenario it takes 30min to run. who cares about complexity the business value is in the answers from the data.

If you're consistently getting reliable answers and finally decide to build a system for these types of reports, clearly this guy's real world experience at Amazon's Clickstream product is going to be far more valuable than what ever anyone who is brand new to the problem can come up with, even if they choose the "correct" algorithm from the start.

Because I bet you that for most real world products that create more than a single fixed format report you actually want your data setup in a completely different way than what you initially thought. You'll probably learn for example that you want to aggregrate data per week instead of per day. or perhaps you want to link this data to an internal users database, or perhaps your boss wants a notification when new data is added. Or perhaps you'll learn that loading it into a single 1GB SQLite DB solves your problem without even needing to think about any algos.

extragood2y ago

I have a similar feeling.

My team builds out a lot of one-off projects for customers, and a lot of them don't need to be especially performant. That has always been the case for reporting/analytical projects.

You know what does make a difference though? Amount of development effort spent on the solution. The most performant solution can very easily cost the company much more in terms of development time spent. So in that sense, it's sub-optimal

michaelteter2y ago

And my favorite answer to questions like this is, "Can I just use grep (or shell commands in general)?"

Grep, uniq, wc, and a few others can be treated as pipeline data transformers to answer questions like this interview question. As long as you make some smart decisions about the order of operations, you can usually get performance on par with what you might write custom code for.

gpderetta2y ago

You would definitely pass if I interviewed you, as long as your solution was reasonably efficient.

I would love to interview a candidate that can show they can use command line tools effectively.

optymizer2y ago

Because shell command injection and process spawning time are not always acceptable side-effects.

michaelteter2y ago

And one of the appropriate questions would be, “Is this a one-off or rare request, or does this need to be productionized? And how sensitive is the response time? And if it must be fast and frequent, then why are we not using some form of indexed db”

kasdi2y ago

Out of curiosity, how would you solve this particular problem with shell commands?

CSMastermind2y ago

Reading that made me happy I'm not an IC anymore.

Don't get me wrong - it's a totally fair question, frankly one I would have been happy to receive when I was interviewing for those roles.

I'm also a fan of whiteboarding coding interviews in general as a way of evaluating talent so no objections there.

There were just something about this specific question that just struck me as boring, souless, like who cares? I think my objection might be that it too closely resembles a menial task I might actually be given - something that I hope to God the upcoming LLM advances automates away.

SkyPuncher2y ago

I also cannot stand the context your given this in. It's setup as a one-off task. Who cares how fast it runs? Is it correct?

johnnyanmac2y ago

yeah, unfortunate "hard studying student who eats algorithms for breakfast" compared to "boring reality". I'm sure there's a fancy data structure for this. In reality, I'd make three buckets (one for each day, and a "isLoyal" byte buffer), and update them as I scan along. O(N) time, O(N) space.

"they don’t know the size of the data upfront", okay. I spend a scan finding the highest customer number and probably make some 10MB index-assossiated buffer. If I'm fancy I find the range and use offset indices to reduce the overall size. You already said it fits in memory and I'm not a distsrubuted programmer. Space is cheap in boring reality

I guess it's one of those cool brain teasers that gets you excited to use your skills from college. Not many get to in reality. Or they prefer other domain-specific skills.

johnfn2y ago

OP's solution seems a little clunky to me. He does this thing where he loads the first day into a hashmap, and then he queries it as he loops over the second day. But why? That just overcomplicates your algorithm, and the memory used by O(2) days is roughly the same as O(1) days. It's also overly specialized to solving the very precise problem that's posed in the interview; what if tomorrow Big Boss wants you to aggregate over 3 days?

A cleaner solution would be to load both days 1 and 2 into the same hashmap. Then you can iterate the map and count whatever condition you want.

tobiasSoftware2y ago

I agree. I hate how these interviews always focus so much on algorithmic time at the expense of flexibility of the code. I agree with the first part about avoiding the n^2 algorithm by using hash maps. However, making your algorithm use half the memory but hardcoding the requirement that a user visited on two days is bad design, especially when it's only halving the memory used. Also not only does it hardcode the requirements, it also makes it much more complex logic wise as you need that "if pages from day 1 >= 2 or first page from day 1 != page from day 2".

My design was to create two hash maps, one for customer to a list of days and one for customer to list of pages, though after reading the article I realized my lists should really be sets. Then you can easily account for any change to the definition of a loyal customer, as all you need to do is use two O(1) lookups and then check the size of the lists. Easy, flexible, and little room for error.

Terr_2y ago

> I agree. I hate how these interviews always focus so much on algorithmic time at the expense of flexibility of the code.

Especially when the question scenario is generating "business metrics" which tend to see a lot of tweaking and iteration.

Having engineers who make contextually appropriate designs and architectures is at least as important as having engineers who are math-whizzes.

eachro2y ago

I've become partial to interview questions where the interviewee just has to build something like a rock paper scissors game or command line to do list app. Simple prompt, easily extendable as well. It's jarring how many people with 5+ years of experience completely fail on this kind of interview.

Any experienced engineer should have no trouble with it. There's no hiding here. - candidate just needs to deliver something working, something relatively clean, and be reasonably pleasant to pair with. No leetcode grinding necessary, though I have found that those who did well on this problem also generally got high scores from my colleagues who do ask LC questions.

Terr_2y ago

I somehow ended up responsible for the coding interview part at a small startup, and I'm pleased to say that it doesn't involve any niche algorithm memorization or specific language knowledge.

It's really just a scenario with some mockup third-party API docs, where the applicant needs to write some paeudocode that checks different conditions, arranges the data, and ties together the different calls.

It might not be testing every possible skill the applicant has, but at least it's in-line with one of the tasks we actually expect them to perform regularly.

flashgordon2y ago

Ah yes where have I heard that "oh my question is so non tricky that you just have to think rationally and I am only looking for your willingness and ability to converse and you will do fine. But most people fail it because they can't have normal conversations".

"Let us hire this person because they dint solve the problem but were a great conversationalist problem solver" - said nobody at a standardized hiring committee!

sjducb2y ago

I worked with someone who was hired because they were likeable, even though they didn’t meet the technical bar. It was a really good decision. They improved the overall atmosphere of the office and made it a nicer place to work.

tmtvl2y ago

Story of my life. I think people just take pity on me (it's better than sitting on the side of the road with my cap on the ground).

flashgordon2y ago

I think likeability is way too penalized these days (bias). Likeability is actually a useful trait in a team for cohesion and empathy as long as you have processes and culture to identify/manage coasting, sociopathy and toxic behavior.

throwaway815232y ago

It wouldn't surprise me if the O(n log n) sorting solution is faster than the O(n) hashing solution, because of better memory locality.

The first answer that popped into my head was a shell pipeline, "cat file1 file2 | sort -k [pattern for customer ID] | awk -f ..." where the awk part just scans the sort output and checks for both dates and two pages within each customer-ID cluster. So maybe 10 lines of awk. It didn't occur to me to use hash tables. Overall it seems like a lame problem, given how big today's machines are: 10,000 page views per second for 2 days, crunching each record into 64 bits, means you can sort everything in memory. If 10 million views per second then maybe we can talk about a hadoop cluster. But 10k a second is an awfully busy site.

I actually had a real-life problem sort of like this a while back, with around 500 million records, and it took a few hours using the Unix command line sort utility on a single machine. That approach generally beat databases solidly.

ghthor2y ago

How much memory does the sort command end up using; 2N?

throwaway815232y ago

It uses whatever amount of RAM you tell it to. I think the default is 1MB, which is way too small. It uses external sorting which means it uses O(1) RAM and O(N) temporary disk space. Oversimplified: it reads fixed sized chunks from the input, sorts each chunk in RAM and writes each sorted chunk to its own temp disk file, then merges the sorted disk files. If there are a huge number of temp files, it can merge them recursively, converting groups of shorter files into single longer ones, then merging the longer ones. I'd set the chunk size to a few GB depending on the amount of ram available.

That is basically how everything worked back when 1MB was a lot of memory. The temp files were even on magtape rather than disk. Old movie clips of computer rooms full of magtape drives jumping around, were probably running a sorting procedure of some type. E.g. if you had a telephone in the 1960s, they ran something like that once a month to generate your phone bill with itemized calls. A lot of Knuth volume 3 is still about how to do that.

These days you'd do very large sorting operations (say for a web search engine indexing 1000's of TB of data) with Hadoop or MapReduce or the like. Basically you split the data across 1000s of computers, let each computer do its own sorting operation so you can use all the CPU's and RAM at the same time, and then do the final merge stage between the computers over fast local networks.

I've used the Unix sort program on inputs as large as 500GB and it works fine with a few GB of memory. It does take a while, but so what.

anotherpaulg2y ago

For fun, I fed this interview question to GPT-4 with aider. See the chat transcript linked below.

The data structures look sensible and it did most of what the interviewer wanted on the first try.

It did make the wrong initial assumption that we wanted 2 unique pages per day. When prompted with a clarification, it made a sensible fix.

When asked to optimize, it went for big hammers like parallel processing and caching. As opposed to saving memory by only storing one file in the data structure as the author discussed.

https://aider.chat/share/?mdurl=https://gist.github.com/paul...

greatgib2y ago

What is very is funny is that, in all big and small companies I have worked in or with, no one ever use or discussed BigO. It only shows up in interviews technical tests.

Lots of good software engineers will have the instinctive knowledge of good or costly solutions without mapping it to BigO.

Also, what is funny is that, in my opinion, BigO is used and required by people that want to look smart but are not necessarily so. Because what we do with bigO is really limited. Almost nowhere the discussion will go further than O(1), O(logn), O(n), O(n2). Because after it becomes hard to understand maths. But in my opinion, algorithm complexity goes way beyond that when you use it in real life.

PH95VuimJjqBqy2y ago

> What is very is funny is that, in all big and small companies I have worked in or with, no one ever use or discussed BigO.

Maybe you don't work in the right places then.

rf152y ago

I kind of agree, the track record of people obsessed with BigO is not good in my book. The last person that valued it would still get the smallest element in a collection by... sorting it and then selecting the first element.

sfn422y ago

Sounds like you don't understand it and are doing the usual mental gymnastics to justify your ignorance. People always say this about math. If you understand it you'll use it all the time because you'll recognize situations where you can leverage it. If you don't understand it you'll blissfully make do, unaware that you could have done things a better way. And then you'll proudly proclaim "math is pointless, I've never needed to use math after I graduated!". Fact is you did need it, you need it all the time. You just don't realize because you can't use it.

traverseda2y ago

>Poor candidates load the contents of both files into memory.

I suppose this is the step where I become a "poor candidate". I think it's important to acknowledge changing client requirements at this point. Sure, loading both files in to memory is less memory efficient, but it's much easier to tweak this algorithm later if you do it. You can change to to count over 3 different days, or 2 days in a 5 day time period, or any number of other things. You can save some memory if you don't, but you'll arrive at a solution that is much less flexible.

I mean the real solution is to load all the data into a database of course, but even given the constraints of the problem I'd still argue for loading each entire file in to memory as the more general and flexible solutions, when our pretend clients inevitably change their pretend minds.

>you don’t need to actually keep every single page from Day 1 in the Map, just two, since the problem is “at least two pages” so a Set of size 2 or even an array of size 2 will use less memory than an unbounded Set.

And with this I think we've crossed over from the practical to leetcode. At this point you're asking the candidate to add a bunch of new code paths (each one should be tested) and make their solution a lot less general. Going from a pretty general algorithm that can be tweaked pretty easily to something super specific with a bunch of new sources of bugs.

>Or, if you’ve already determined that a customer is loyal, you don’t need to waste CPU cycles going thru the logic again next time you encounter that customer in Day 2.

No, please load it all in to your data structures properly, even if you "waste" a bit of time. All these weird little conditionals sprinkled throughout your code when you're ingesting the data are going to be sources of problems later. They might save a bit of memory, a few cycles, but they significantly increase the complexity of the code, and make a refactor of tweaks much much harder.

If this developer started doing stuff like that in an interview with me, well it would raise some red flags.

>If you want to make the problem even more interesting, you can add a third file. I will leave that as an exercise for the reader as well!

See, our imaginary customers imaginary minds did end up changing. Bet you wish you had loaded both files into memory now.

Niksko2y ago

It's a good question, and the author explains it and the logic really well.

As someone going through this style of interview at the moment (but not having interviewed at Google, Microsoft or Amazon), two things jump out at me:

- If you're going to ask this question and get it done in 1 hour, does the code really matter? I'd argue that if you can get to a good or optimal solution, 99 times out of 100 you can write the code. If I got this question and didn't know better, I'd be stressing about writing the code within an hour. Knowing that we wanted to spend most of the time discussing the algos and data structures would be really useful to me. Maybe Google/Amazon/Microsoft interviews really stress this in their preamble, I don't know.

- The big "issue" I see with this question is that it relies on the interviewer knowing exactly how to steer the conversation. I think I could get to this solution with the hints, and the author seems to imply that it's ok to need a few hints. But an interviewer that doesn't know the right hints to give (or phrases them poorly) is going to turn this question into a train-wreck. This isn't an issue for the author, they clearly know this questions backwards and forwards. But giving this question as a 'standard' question that others will deliver? I think it could easily end up being too conservative and cutting out a lot of otherwise smart developers.

In general, that's my criticism of this style of question: they all claim that they're about 'seeing how you think'. But I think expecting interviewers to be able to elicit a conversation that really shows 'how a candidate thinks' is much more on the interviewer rather than the interviewee. You're expecting people whose primary job is writing software to be really good at delivering interviews.

Instead, you're going to have candidates who most of the time will do well if they can pattern-matching against problems they've seen in the past, and poorly otherwise. I can see how questions like this seem good on paper, and I'm glad this question works for the author. But it's the combination of interviewer and question that makes it effective, not just the question alone. A better title for this post might be 'My favourite way of interviewing candidates', because this post is mostly to do with the author's mental model of how to run an interview with this question.

AndyNemmity2y ago

Well, you're holding it in a log file. Either the data is important enough we'd like to keep it, so we should put it in a database or elasticsearch, or the data isn't important enough, and I'd like to get further clarity on what you're trying to achieve.

... I'm guessing I didn't get the job.

avmich2y ago

Yeah, I'm disagreeing with some approaches. For example, in real life the engineer usually knows the constraints of the problem, and can add them himself. More, asking clarifying questions regarding memory versus speed will quite often draw blanks from less-technical consumer. Some aspects of the problem seem artificial - do we need exactly two days visits to be loyal? Exactly two unique pages? What one's going to do tomorrow, when these requirements change? And that would affect the design chosen.

I feel that author filtered out lots of great candidates over this problem, which might be something to pause about. On the other hand, interviewing to get a good signal is indeed a tricky business, so I can sympathize.

josephg2y ago

> Map<CustomerId, Set<PageId>> will do

You can do a little better than that. Each item in your map has exactly 3 states:

- We’ve seen this customer visit one unique page with (xx) url on the first day

- We’ve seen this customer visit two unique pages - but only on the first day.

- We’ve seen the customer visit one unique page (xx) and they’ve visited on both days.

In the second state you don’t actually care what the URLs are. And I think the logic for the 3rd state is identical to the logic for the 1st state - since you only add them as a “loyal customer” by visiting them again on the second day. So I think you can get away with using an Option<Url> to store the state instead of a Set or a list. (Though I’d probably use a custom parametric enum for clarity).

It’s a great problem to model using a state machine.

gpderetta2y ago

It is mentioned later that you only need to list at most one URL .

croes2y ago

Did any of the hired programmers for Amazon work at the Amazon search?

I don't mind O(n²) if I get good result but Amazon's search seldom gives me good results.

Same with Microsoft's search in the start menu. Doesn't find Excel if I type "exc".

SkyPuncher2y ago

IMO, this interview question is going to get you amazing developer who fail to build anything of value.

I don't like the trick of failing candidates if they don't ask a question. 90% of this style of interview want candidates to rifle through solutions. If you want to talk about requirements, be explicit about it.

I'm really amazed that this "best interview" question really just boils down to leetcode for a _Senior Staff_ level interview. I don't know about y'all, but the _Senior Staff_ and _Principal_ developers I've worked with aren't wasting their time of shit like this. They're ironing out requirements. They're working with stakeholder. They're architecting systems. They're figuring out how to deliver the value the customer wants - and they're ensuring that it's actually the customer wants.

-----

There's a place for performance, but the fast running turd is still a turd.

josephg2y ago

> I'm really amazed that this "best interview" question really just boils down to leetcode for a _Senior Staff_ level interview.

A good interview should involve more than just a coding problem. But it should absolutely require at least one coding problem. It’s mind boggling the number of “senior” people with good resumes I’ve screened out in interviews over the years because, simple as a problem like this is, they really had no idea how to even start solving it.

I don’t know about the poster, but when I’ve done interviews - especially for senior people - there are a lot of different types of assessment I’d want to do before hiring them. I’d also want to assess their social skills somehow (eg get them to present to the team about something interesting). And ask some high level systems architecture questions, talk about their background, and more.

stouset2y ago

Or JIRA, which is particularly inexcusable.

My username at a large org is my first name. In any situation where someone wants to link to me or mention me, typing my username brings up a list of every person at the org with my name, alphabetically. My last name inevitably sorts me down toward the bottom.

You would think that an exact literal username match would have priority, but no. Typing any prefix of my name similarly sorts everyone else before me too.

fmajid2y ago

Amazon’s VP of Search used to be Udi Manber, who literally wrote the book on Search. The crappiness of their search is deliberate, it is there to serve Amazon’s business objectives, not your needs.

RheingoldRiver2y ago

> Same with Microsoft's search in the start menu. Doesn't find Excel if I type "exc".

What do you get? I tried just now, and `E` gives me Excel. In fact, I typed "Esc" first on accident, and I got something different for "Es" but "Esc" gave me Excel too.

croes2y ago

Sometimes I get nothing until I type "Excel", sometimes I get suggestions for a web search.

1 more reply

RagnarD2y ago

What if a candidate notes that web hits can come from all parts of the planet and every timezone and therefore "same day" for a particular end user does NOT overlap with the "same day" of the log files and thus immediately throws into question the meaning of '(a) they came on both days'. Many users could visit the site multiple times in one of their days, but recorded as two separate days in two separate log files. This certainly adds some complexity to the question and might not even be something the interviewer considered in the first place.

ZoomZoomZoom2y ago

Great to see I'm not the only one here who isn't fixated on 'how' so strongly that they've stopped asking 'why'!

bjornlouser2y ago

Exactly. It's a trick interview question since the minute you deliver this report on 'Loyal Customers' someone will want to increase the complexity of the 'unique' page visitation constraint to include information that will never be available in the log file.

pjot2y ago

I think these kinds of problems cause interviewee’s (under stress) to overthink the solution.

“Load to a relational store and use sql” would be a reasonable answer that, I’m sure, would be acceptable in most cases.

gpderetta2y ago

It would if I was the interviewer. But expect that the followup question would be how the SQL engine implements the query.

AndyNemmity2y ago

How isn't remotely relevant to being able to accomplish the task.

I only know how certain SQL engines implement queries because I was tasked at that point to increase the speed of the queries, and went deep into the debugging level detail to understand the exact cost of each action.

But I haven't done that in 7 years, and couldn't tell you much more than the tools used to figure it out.

I cannot remotely understand the requirements people make up for our jobs that have nothing to do with doing our jobs.

j-pb2y ago

  Sorting the files can be a logarithmic operation with constant memory.

A great candidate would know that this is a log-linear operation O(n*log(n)), not a logarithmic one O(log(n).

donatj2y ago

I mean in my book anyone doing this in anything other than a bash one liner leaning on Awk is overbuilding.

antisthenes2y ago

2 liner SQL query and forget about it.

missblit2y ago

The rationale for not wanting SQL solution feels a bit strained. If you don't want an SQL answer just say "Hey buster this is a C++ interview!"

krackers2y ago

Good thing SQLite is written in C.

yardstick2y ago

How about a single map to a structure that contains a bitmap for days visited, and a bloom filter for pages visited.

Very low memory and compute requirements.

Map<CustomerId, Metadata>

Where Metadata is a dateVisited bitset plus a bloom filter (32/64 bits depends on how accurate you need it to be).

antisthenes2y ago

I agree with most points in the article except for the database part.

> Since this is not a distributed systems problem and the data fits in memory, why introduce the additional complexity and dependencies of a database for something that you can solve with 20 lines of simple code?

Because this is a question about getting the right data, and SQL Databases are...extremely good for filtering, sorting and grouping... data. Besides, every page visit from every client is a unique observation, and the principle of...tidy data suggests that every observation use a database row.

Why solve this with 20 lines of code, when you can solve it in a 4 line SQL query?

phendrenad22y ago

The venn diagram of people who blog about their coding interview questions and the people I want to interview with does not overlap. Nothing personal, but they always come across as "here's how I make monkeys dance for my amusement" or "here's a clever question and the more people ask me about it in the interview (despite it being perfectly clear to begin with) the more it strokes my ego and the more I am likely to interview them"

mattkenefick2y ago

Job interviews like this are terrible. It's a shame he's put so many people through it.

happytiger2y ago

This is a great way to hire people who all meet a homogenous standard of behavior. It’s absolutely efficient from an engineering standpoint to eliminate weak links who can’t solve basic engineering problems.

It creates a team of people who are good at tests, and good in testing environments.

However, there are so many things that make an engineer great that have nothing to do with how they solve problems but who they are, how dedicated to improvement they are, etc.

But they may not be someone who:

-thinks quickly on their feet - finds this type of situation tolerable - may have disabilities one can’t see that would make this kind of interview difficult for them - have personality challenges or anxiety in social situations that make interviews like this impossibly difficult

The list of reasons that “whiteboard testing interviews” don’t work well is long.

I don’t think there’s anything wrong with this approach if that’s the kind of organization you want to build. But it does tend to create homogeny and act as a gatekeeper for “those who do not fit in.”

Some of the very best engineers I have ever hired would never make it though this interview. But they were amazing engineers who did world class work.

shepherdjerred2y ago

This article mentions "good" and "great" candidates many times.

How is the author determining which candidates are great? Do "great" candidates answer the questions the best, or is the interviewer following up 1-2 years after hire and examining the impact of the person?

Great candidates aren't those who can answer DS & algorithms questions the best, but it seems that the author thinks that way.

amluto2y ago

> For example, you don’t need to actually keep every single page from Day 1 in the Map, just two, since the problem is “at least two pages” so a Set of size 2 or even an array of size 2 will use less memory than an unbounded Set.

That seems overcomplicated. For each customer on day 1, you either have multiple pages or you have a single page. If you see them on day 2 and they had multiple pages on day 1, then they are loyal. Or if they had a different page on day 1 than day 2, they’re loyal. (Or two different pages on day 2, but this comes along for free.)

So the data structure can be:

map<customerid, (MultiplePages | pageid)>

Where MultiplePages is choice in a sum type that doesn’t store any associated data. Or you can do:

map<customerid, optional<pageid>>

Where the none state of the optional means there are multiple pages, but this is a bit of an odd use of an optional.

zdw2y ago

You could get to something working and relatively bug free in 15 minutes with a shell script consisting of little more than cut, head, sort -u, and grep.

For reference: http://www.leancrew.com/all-this/2011/12/more-shell-less-egg...

kristopolous2y ago

Sorting, merging, then just a clever unique was basically the only thing I'd consider.

If I used this question I'd add to it.

"Pretend you have to do the work on an Arduino uno, which has very little resources. Your uno can request input and produce output from the disk where these are stored at whatever offset you wish. The log files are 100GB each and sit on a desktop computer with a modern Linux on it. Each log line is 512B. You can create files if you need to through some unspecified protocol with the desktop computer. But the desktop computer must be dumb. It will only write to and read from disk. You can send it any disk system call you wish. Step 2; Now do it without sorting or something absurdly slow"

The point is to ask for actual creative solutions instead of the pattern recognition problems that most of these problem formats are.

You want something that a weekend of drilling won't change the result of

Izkata2y ago

Also possibly comm for combining two lists (one for criteria (a), another for criteria (b), then intersect them with comm).

corethree2y ago

Easy.

1. Find names appearing in both log files. From this create a Set of customers that appear in both. In python it's just creating a set from the first file, creating another set from the second file and unionizing them. O(N)

2. concatenate both files to treat as one. create a Map with key: Customer and value: Set[Page]. This is basically iterating through the log, when you see a customer append the customer_id to the map and add the page to the set if it already exists. O(N)

3. Filter the map for all customers with length(Set[Page]) > 1. To get the Set of all customers that visited more than one page. O(N)

4. Combine the sets of customers who visited multiple pages with customers that appeared in both log files into a new set. O(N). You can do this with, again the union operator in python.

The irony is I'm more able to do this in python then I am in SQL. For SQL stuff at work I just look it up. I never remember the exact syntax. Total runtime O(N) total memory O(N)

This is just your basic hashmap stuff for fast look up and aggregation. Usually fang questions are harder than this.

bvrmn2y ago

union -> intersection

corethree2y ago

Thanks for the correction.

andrewstuart2y ago

If the job is log file processing, it's an entirely reasonable question.

But most jobs are not log file processing.

It's ridiculous to generalise this sort of thing:

"We are cooking soup here at Amazon Soup Kitchen. My favorite interview question is to ask candidates to bake a cake, that's the real test of any cook."

marssaxman2y ago

The question is not about log processing; that is just a framing device. The interviewer does not care whether you can process log files, specifically; the interviewer cares whether you understand basic data structures and know how to make reasonable tradeoffs between time and space complexity, generally.

sfn422y ago

The people who criticize these questions do so out of insecurity - they know they aren't at the level required to solve it, which is embarrassing because it's pretty simple stuff, so they try to poke holes in it and justify their ineptitude with comments like the one you're responding to.

chiefalchemist2y ago

First they say they don't like tricky questions, and then goes on to admit the "spec" is ambiguous. True, candidates are permitted to ask questions, but perhaps they are trusting of the interviewer and expect the question to be actionable as is? Or just the same, the candidates don't trust the interviewer and don't ask questions because they fear that'll result in a penalty? If you come from an environment where questions aren't rewarded - and there are plenty of those - then silence is likely.

Finally, it's worth mentioning, while the question + answer might correlate well with the hiring decision there's no mention to how well it predicts future performance. That said, there's a survivor bias at play so using it against performance might be iffy.

sneed_chucker2y ago

Frankly, I don't want to hire people who are too timid to ask a simple clarifying question in an interview.

A big part of this job is dealing with ambiguity and communication. If you feel the requirements of your task aren't clear, then go to the person who made the task and clarify them. What's the alternative, exactly? Staying silent and waiting? Wasting time implementing the wrong solution?

davidw2y ago

Then make it clear that it's "real world" and that they can and should ask questions about aspects of it that they need more information on, rather than just hoping they do so.

1 more reply

chiefalchemist2y ago

You're assuming the nervous stressed candidate can read your mind? That's not going to yield the best candidates. It's going to yield the candidates who are best "tuned" for the process.

I've already explained simple and obvious scenarios where the context can impact the candidates in sub-optimal ways.

If someone says, "Solve this" and the canidate attempts to do so, but that's not what was really expected? That's trickery (and foolish). On the other hand, if the interviewer wants, "Here's a problem, let's discuss..." then *that* is what the interviewer should lead with.

1 more reply

two_handfuls2y ago

I think what the author meant is that they dislike questions that require too specialized knowledge, so once you know the trick the question is easy to answer, but whether the candidate knows that particular trick is not otherwise correlated with candidate skill.

phanimahesh2y ago

I have practically never seen a prd that did not have some ambiguity. I realised spotting ambiguity and asking questions is an essential and invaluable skill that needs to be selected for.

One of my teams was very surprised that I rejected a prd for being too vague and put my foot down that it will not be picked up until specific questions are answered. They were, I would not say meek, but resigned to the inevitability of product managers pushing poorly thought PRDs and not having any say in the matter.

I took my time training them to say no and spot ambiguities, and I like to think they have all become better developers and product managers for it. I always pick questions that have more than one obviously correct interpretation to see if the candidate notices it.

The idea that you trust your interviewer to provide a directly actionable question is strange. I would expect that in campus hirings, and for entry level fresh grads, but not to anyone with even a year of experience. At senior levels, it becomes more and more important to spot ambiguities and clarify them before they result in misunderstandings, wasted efforts, and worse. I trust a good interviewer to have a question that can provide them useful data points on candidates experience, skill, thought process and attitude.

Whether spotting ambiguities in the question has any correlation with future performance is harder to answer, but methodical people with attention to detail are preferrable to the alternative.

If the candidate comes from an environment where questions are penalised, they would be a bad fit for a team that values and expects questioning. It is somewhat unfair, but either way, the interviewers are selecting for their preferred qualities.

davidw2y ago

I agree with this... you're asked to do something and the interviewer is purposefully holding back information wanting you to come out and ask about it. That feels a bit tricky to me.

chiefalchemist2y ago

It's certainly not transparency.

I do agree, the real job would habe to deal with ambiguity. But this is the interview. The interviewee has a completely different mindset going in. "Playing games" to see how they respond...that's for the likes of the NSA, CIA, etc.

"We can't find good candidates"? Nah. Your hiring process sucks. Get a mirror.

acheron2y ago

Ugh yes, the interviewers who give ambiguous questions and expect you to read their mind are the worst.

What’s the old xkcd: “communicating poorly then acting smug when you’re misunderstood is not cleverness.”

klodolph2y ago

The skill under test is for that part is “can solve ambiguous problems”, and what you want to see is that the candidate is able to recognize that a problem is ambiguous.

I think the hard part is recognizing that a problem is ambiguous. Telling someone that a problem is ambiguous kind of defeats the point. IMO, it’s not about reading someone’s mind, but recognizing that there are multiple interpretations to what somebody has SAID. That seems less like a mind-reading technique and more like, you know, a communication skill.

I have gotten lots of ambiguous problems during my career, it seems only fair to have them appear during an interview.

philwelch2y ago

He’s not expecting you to read his mind; he’s expecting you to notice the ambiguity and ask a clarifying question.

1 more reply

laurent_du2y ago

I am baffled that several contributors to this thread seem to find this question difficult, some even calling it "dehumanizing". This is a very easy and basic question and I wouldn't want to work with someone who couldn't solve it efficiently in a few minutes.

tobiasSoftware2y ago

I agree that "throw everything in a hashmap" should be straightforward and is a good interview test. However, his further steps to "optimize" it by saying "Poor candidates load the contents of both files into memory." are terrible. Yes, that might optimize resources, but first it hardcodes the requirement that there are exactly two days breaking the solution if the requirement changes, and second it adds a bunch of finicky fragile code about "if there are two pages or more from day one or if the first page from day one is different from the page from day two".

Great candidates treat software like a business with changing requirements and code that is read by multiple people, poor candidates treat software like a math challenge where the only goal is to use as few resources as possible.

coolThingsFirst2y ago

The problem is that once you start asking algorithm questions for top tech companies people will optimize for knowing them deeply instead of exploring other fields of CS.

This is what happens with Codedorces/ACM-ICPC. Suddenly everyone is hyper driven to crank them out day after day under pressure and much more interesting fields databases or turning a business idea into a usable app get neglected.

Lets not fool ourselves no one is going to solve hard medium LC under time pressure in 30 minutes unless they’ve seen a similar problem before which leads to hiring worse engineers who pass interviews.

Once a metric becomes the goal it ceases to be a good metric.

seanmcdirmid2y ago

https://en.m.wikipedia.org/wiki/Goodhart%27s_law

svilen_dobrev2y ago

i have a ~~similar task in my python course.. given 2 text files with similar format, one holding persons and cds they have each, another holding songs and which cds they are on, to answer which songs particular person has, and which persons has particular song. Yeah, a join.

Nothing about O(blah) though, these are way-too specialized / optimizing lands.

that said, coding != thinking..

Some people cannot think of a solution at all, 50% go for 3 loops one-in-another.. copy-pasted 2x2 times, few do the ~~simplest 2 maps with 2-3 funcs.. and once a "master" invented a class with quite a few methods contaning copy-paste inside :/ Probably one can do some analysis over who goes which way and why but i never bothered.

It is also a good ground to show the different ways to add key-value to a map if it's not there - and their pros/cons (one can hook some O(blah) stuff here. i don't. There's just simpler vs faster, with variants).

And yes, having an (subtly) unclear requirement (that should be identified and asked about), is important part of the learning - that's 90% of cases in life.

animal5312y ago

Data structure questions heavily favors previous use. For example in the past I've used a multi-dictionary for certain problems, so it would have been an easy reach for me; but maybe not for someone from a different coding history.

Personally I prefer something like fizzbuzz which is a pure code question, it applies to candidates of all levels and tells you if they can reason through problems.

gpderetta2y ago

In this specific case it heavily favor anybody that has spent more than 5 minutes with a programming language and used a dictionary.

1 more reply

ZoomZoomZoom2y ago

The question looks more or less OK, except the ambiguity part. Hower, perhaps after "25 years in Big Tech" one shouldn't invent a process that discriminates against "loyal" customers living in timezones unaligned with the logging server. As usual, users come last.

dooglius2y ago

I think unaligned customers are discriminated in favor of--more likely to be browsing around the server's midnight and get marked as visiting two days.

ZoomZoomZoom2y ago

I don't think you can decide if it's a net positive/negative discrimination with any degree of certainty without knowing the date-stamping machine's location and the demographics of the userbase. However, there most certainly will be a non-zero amount of users discriminated against (browsing during a span around their local midnight that falls onto a single date on the servers), and that's what matters most in my opinion.

1 more reply

two_handfuls2y ago

Nice question. Reminds me of the famous saying: “never underestimate the power of sorting.”

gpderetta2y ago

You do not need to sort for the preprocessing based solution, you only need a K-way partition, which if I'm not mistaken, it can be done in two linear passes. I don't remember if it can be done in place easily though.

ozim2y ago

Love how he goes, he does not like tricky questions and first thing he describes is a trick hidden in the question designed specifically to fool candidates.

Yet he still thinks it is not a tricky question.

But great article and I learned something from it.

twelve402y ago

but this "trick" tests for something very real: many times i've seen people around me mindlessly jump to coding a PM-written ticket without clarifying important details or even doing a basic sanity check on the feature. The end result is often a disaster for both parties, and should be avoided with a bit of thought upfront. It's not like anyone is lying or misleading here, such details get omitted all the time in real life but you need to collect them for a proper implementation.

ozim2y ago

I don't think solution for that is dropping perfectly capable people on the interview.

There should be process for clarifying tickets with the team because you don't just drop tickets on developers "out of nowhere" and expect them to ask questions especially if company culture is "drop the ticket on dev let him figure this out".

Someone has to either make ticket written in detail so you can drop it on dev without dev needing to ask questions OR make refinement where you get team to ask questions to have input from multiple people because single person is not able to understand all the details of the system.

pharmakom2y ago

Just chuck it in SQLite and move onto the next business problem. 20 mins, tops.

rakoo2y ago

Hot take: The optimal solution is the most obvious one for people who are more used to UNIX commands than programming languages, because the primitives in UNIX are more powerful (they give you a beautiful solution if you can make your problem fit in them; if not, the shell becomes atrocious).

Here's my reasoning for the problem: we have 2 files, one for each day, and we want to see who came both days on different pages. That's a job for join, with some sort|uniq in the middle.

- for each page, we want unique CustomerId -> PageId "mappings", but in UNIX land that's just 2 columns on the same row:

    cat dayX | awk '{print $3 $2}' | sort | uniq

- now I have two lists, I join them

    join day1_uniq day2_uniq

this gives me, for each customer, its id, then all its pages, on the same line. Customers who came only on one day are not in the output.

- now I want to see if there are at least 2 pages and those 2 pages are different. There's no easy UNIX way to do this because it's all on a single line, so we'll use awk to build a hashmap. We don't need to build a map of all pages, we only need to see if there are at least 2 different pages

    cat both_days | awk '{for (i = 1; i < NF; i++) {pages[$i] = 1; if (length(pages) == 2) {print; next}} }'

(Note: length() is not posix but gawk)

Result: a list of all customers having visited at least 2 different pages on both days. Everything is dominated by the initial sort. I haven't ran this, it's a rough draft.

leephillips2y ago

The best answer to the interview question is not mentioned in the article. It’s something like “I’m not interested in working here because I don’t want to use my art to spy on people.”

hoarf2y ago

Here's a better interview question:

What is the problem of using later job interview stages as validation for early stages and what would be a better metric for validation.

michael_leachim2y ago

that looks like I am a good candidate for Amazon, lol.

but on a serious note, the good solution is sort of obvious and in general you encounter much more interesting problems on a daily basis, but than again I am working in Clojure where working with data structures is much easier and straightforward than in Java.

michael_leachim2y ago

I am sorry if that sounds condescending, in reality I question my engineering ability almost every day. There is a lot of things that I know horribly less than any engineer worth their salt.

say_it_as_it_is2y ago

What are your favorite interview questions for people who assume management roles as these?

Archelaos2y ago

> I’ve actually expressed the problem in an ambiguous way.

> Did I mean 2 unique pages per day or overall?

The author needs a course in logic. "They visited at least two unique pages." is not ambiguous. Visiting page A on day 1 and visiting page B on day 2 makes the sentence true.

arp2422y ago

Most people typically do not form sentences based on pure logic, and many things could reasonable be misinterpreted even if purely logically it's unambiguous. Language is not math.

1_1xdev12y ago

It’s obvious by the next few sentences that he means “others interpret this in an ambiguous way”, given how many people he says get it wrong

DangitBobby2y ago

Assuming the quoted wording is the actual question he gives, it's not ambiguous. With the given wording, a person hearing it and understanding that it's stated precisely without ambiguity shouldn't be dinged for not questioning the interviewer's ability to recognize the requirements are unambiguous. Because it's an _interview_ by a technical professional who thought about the problem beforehand and not someone who you should be worried about not understanding what they are asking. In fact, I can imagine another interviewer that would ask you the exact same thing and ding you for asking a stupid question.

Interviews are just not the same thing as real life requirements gathering, so people's thought processes will not be the same. Even if you try to roleplay as someone that doesn't understand how to state requirements precisely, all of the normal procedures and thought processes for sussing that out are compromised due to the inrerview setting. And your ability to assess those processes are compromised due to how familiar you are with the question and how much you've deconstructed it.

It's not the time to be tricky (though somehow simultaneously believing it is and is not a trick question). There's so many more interesting things that could be gleaned from a person's interview performance that "is this interviewer playing fuck-fuck games with me" doesn't rate whatsoever.

tkot2y ago

Does visiting pages A and B on day 1 and visiting page A on day 2 also make the sentence true? I think that's the source of ambiguity (or maybe it's ambiguous to me only because English is not my native language).

pytness2y ago

> They visited at least two unique pages.

The user has visited A and B on day 1, and A on day 2. So the total page hits is (A, B, A). Remove duplicates and you have (A, B) which makes the sentence true.

Imagine he said:

> They bought at least two unique products

What would you expect the requirements to be?

1 more reply

thaumasiotes2y ago

> I don’t mind getting the naive solution first, but I really want to see my candidate having that aha moment that O(n²) is probably never good in any problem. And I want that aha moment to come pretty quickly and without hints. No great engineer should ever settle for an O(n²) algorithm, unless bound by memory or some other unmovable constraint.

But that is purely a cultural convention. O(n²) is great for many important problems. For example, parsing a sentence in natural language is Ω(n³); getting it in O(n²) would be evidence that the candidate was a deity.

Why select for familiarity with the details and conventions of the interviewing process? How is that supposed to be helpful?

> Candidates then switch it around to have CustomerId as the Key, and PageId as the Value of the Map. But that’s not particularly great either because it overlooks the fact that you can have many pages per customer, not just one. Some candidates have the intuition that they need a Collection of pages as the Value of the Map

But this is wrong. You're completely fine having a map from CustomerId to a single PageId, because the problem statement specifies that you're collecting customers who have visited more than one page. If you process a record that says CustomerId = X, PageId = Y, and you look up CustomerId in your map, these are the possibilities:

1. map[CustomerId] has no entry. You write Y as the new entry for that CustomerId.

2. map[CustomerId] is already Y. You do nothing.

3. map[CustomerId] has an entry that is not Y. You now know that CustomerId represents one of the customers you're trying to find.

At no point did you need the map values to represent more than one page.

> The condition “customers that visited at least 2 unique pages” tends to be a little harder for candidates to get right, so if they’re stuck I throw a little hint: you have a Set of pages from Day1, and a single page from Day2… how can you determine that this is at least two unique pages?

> Poor candidates will loop through the elements in the Set to check if the page from Day2 is in there. This turns your O(n) algorithm into O(n²) again. The number of candidates who have done this is surprising.

> Better candidates will do a .contains() on the Set which is an O(1) operation on a hash set. But there is a catch with the logic.

> The intuition to get this right is this: If you are inside that If loop and the customer visited at least two pages in Day1, and they visited any page in Day2, they’re loyal, regardless of which page they visit in Day2. Otherwise, they only visited only one page in Day1, so the question is: is this a different page? If so they’re loyal, else it’s a duplicate so you don’t know and should keep going. So your If statement has an Or:

> [code sample involving the interviewer's terrible solution]

> There’s a need for attention to detail, like using “>” instead of “>=” or missing the “!” in the second statement. I saw these fairly often. I didn’t worry. Great candidates spotted them quickly as they double-checked the algorithm when they were done. Good candidates spotted them after a little bit of hinting. That gave me a good signal on debugging skills.

Why in the world is this presented as a desirable solution? You have a Set of visited pages from day 1 and a single visited page from day 2. You want to know whether the total number of visited pages is more than 1.

Add the page from day 2 to the Set [O(1)], and then count the Set [also O(1)].

> What if you pre-processed the files and sorted them by CustomerId, then by PageId?

> If the files are sorted, then the problem is easier and it’s just a two-pointer algorithm that you can execute in O(n) with O(1) of memory.

> Since the second sort key is by PageId, you follow another two-pointer algorithm to determine that there are at least two unique pages. So it’s a 2-pointer algorithm within a 2-pointer algorithm. It’s kind of a fun problem! I’ll leave the actual implementation as an exercise for the viewer.

> If you want to make the problem even more interesting, you can add a third file. I will leave that as an exercise for the reader as well!

If you're preprocessing the files, why not concatenate them before (or while) sorting them? The asymptotic resource requirements are the same and you end up with one file that can be processed in O(n) time and O(1) space. (Though the result of the algorithm necessarily takes up O(n) space, so I'm not sure how much this should count as an improvement in terms of space requirements...)

This additional preprocessing step makes the generalization to three files trivial. The algorithm is identical: concatenate the files, sort the monofile, and then walk through the sorted entries.

bvrmn2y ago

Awful solutions TBH. They are quite hard to implement (requirements are coupled and could not be composed, logic heavy details) very specific and non-extendable for requirements changes in any way. You don't want this kind of code in a real life.

j / k navigate · click thread line to collapse

222 comments

hornban2y ago

Asking candidates to come up with this kind of solution in an interview setting where they are under all kinds of pressure is honestly dehumanizing.

dnsco2y ago

The "is this person good enough for me" interview allows geniuses who are assholes through. I prefer to filter for good teammates.

cperciva2y ago

Failing to even get the O(n log n) solution tells me that someone should never have graduated.

arp2422y ago

Well yes, but that's not what this is about. It's about social pressure, not ability.

But ... I wouldn't be able to program this in an interview setting.

Or, basically this sketch: https://www.youtube.com/watch?v=kQa5NsdYSts

thaumasiotes2y ago

> Failing to even get the O(n log n) solution tells me that someone should never have graduated.

This is something I wonder about in my comment sidethread - to me, the natural solution is the O(n) one in O(n) space.

I see that the O(n log n) solution requires O(1) space to run. But it requires O(n) space to return a result! Is that actually better than O(n) time and O(n) space?

2 more replies

SOLAR_FIELDS2y ago

I have a non traditional background (non comp sci) and while I understand complexity I wouldn’t have immediately arrived at the solution like this.

bfung2y ago

The question asked doesn’t involved any computer science algorithm knowledge at all, nowhere near leet code complexity.

sfn422y ago

And now you know who the devs are who think LLMs will replace us. They're the ones who think this question is too much to ask.

munksbeer2y ago

The coding exercise will involve writing tests, deisgn of classes, working with floating point arithmetic, working across thread barriers, some fairly standard maths, and so on.

I actually can't say we're hired a bad candidate yet and they all report that the interview is a positive experience.

weaksauce2y ago

1 more reply

jemfinch2y ago

> I prefer to talk them through the problem together, like we were actual teammates working on a problem together.

    sort <(sort -u <(cut -d' ' -f1,2 /tmp/day1)  <(cut -d' ' -f1,2 /tmp/day2) | cut -d' ' -f1 | uniq -d) <(sort <(cut -d' ' -f1 /tmp/day1 | sort | uniq) <(cut -d' ' -f1 /tmp/day2 | sort | uniq) | uniq -d) | uniq -d

Maintainable? No. Done iteratively in less than five minutes? Yes.

If your perspective is that it's unrealistic to ask anyone to come up with an answer to this question on the spot, you should take a long, hard look at yourself and the quality of your colleagues.

DoesntMatter222y ago

As mentioned over and over. This isn't about the ability to do it. This is about making it less of a hazing ritual that generally only seeks to make the interviewer have a sense of importance.

nsxwolf2y ago

If people knew that people like you existed, they’d be terrified of ever asking for help on anything.

AstralStorm2y ago

This thing runs for 2 days not total scan though. It does not scale to a total scan due to an O(n) memory requirements of storing every hit.

You didn't ask the question you were supposed to ask. :)

aaomidi2y ago

I would seriously expect anyone above mid level to not even consider the brute force solution.

DoesntMatter222y ago

Sadly I know plenty of Seniors with 15 or so years of experience. That would write brute force and it still wouldn't work right.

Senior in a perfect world sure but the reality is that for decades ppl have been desperate for talent and so you don't have to do much to get by.

I knew a popular principle engineer at a fortune 50 company who was loved by execs. He would check in 5 or 10 of the same file. Opposite of the dry principle.

mock-possum2y ago

corethree2y ago

This one is pretty easy though.

zzyzxd2y ago

Want me to make it fast/memory efficient? You have to say it. Forgot to mention it in the first iteration? No problem, cut me a ticket and I will see if I can sneak it into my next sprint.

xen02y ago

Describing 'throwing a hashmap' at the problem as 'premature optimisation' is a bit reductionist.

It wouldn't even occur to me to go with the naive O(n^2) solution because it has such obvious drawbacks with large inputs.

And it's an interview question... yes, you're getting a No-Hire if you just leave it there. Although I personally would prompt you if you're not interviewing for a senior position.

didntcheck2y ago

2 more replies

marcos1002y ago

In real life people do a lot of O(nˆ2) code without realizing, and usually it's just some unnecessary loop inside another loop. I want to know that you care about some things.

zzyzxd2y ago

TheCoelacanth2y ago

Not even large inputs. More like medium or even the larger end of small. A quadratic-time solution can often start to be noticably slow with just a few thousand items.

A realistic dataset for the problem they descibed could easily be tens of millions of records even if you're not a Google-sized site.

philwelch2y ago

> Want me to make it fast/memory efficient? You have to say it. Forgot to mention it in the first iteration? No problem, cut me a ticket and I will see if I can sneak it into my next sprint.

It’s probably a lot easier to just not hire people who deliver poor quality work in the first place. Then you don’t have to worry about whether or not they can go back and fix it later.

PH95VuimJjqBqy2y ago

absolutely not, in the real world you work with constraints and you can make simplifying assumptions.

But if your description is to keep it simple I'm going to do that in good faith.

pavel_lishin2y ago

> Want me to make it fast/memory efficient? You have to say it.

Me: Child, please go wash your hands.

Child: rinses hands under cold running water for 2 seconds

Me: With soap! And warm water! For more than two seconds!

Child: You want me to wash my hands with soap? You have to say it.

thaumasiotes2y ago

2 more replies

avmich2y ago

Parent: Child, please go wash your hands.

Me: Should I wash both hands?

Parent: Yes.

Me: Should I wash them in any order or the order is important?

Parent: In any order.

Me: Should I wash them fast or thoroughly? I need to ask as that may affect the way I'm solving this interview problem.

1 more reply

hoseja2y ago

You shouldn't be using O(n^2) algos at all, ever, unless the problem is well-constrained to be very small or there is no other possible solution.

They fester and/or blow up regularly otherwise.

professoretc2y ago

I've heard O(n²) described as the most dangerous asymptotic complexity, because it seems fine for small testing inputs but falls down when you throw real-world -sized data at it.

nsxwolf2y ago

This will never matter in, say, a site configuration dashboard where a set of options is generated quadratically from a couple small sets of data.

There are some features that will never be used at any significant scale.

huehehue2y ago

I generally agree with this, as an interviewer. You need to be explicit about what you're screening for, unless you're trying to hire a mind reader.

E.g. "this is a one-off report" and "ok, how would you do it differently if we wanted to turn this into a feature at scale?"

It's great if an engineer gets there on their own, but there are so many tangents one could go on that I won't ding someone for picking the wrong one, or asking for the most relevant one.

I would expect seniors not to need that level of guidance on the job, with the caveat that expectations should be reasonably set in design docs, tickets, various bits of context, etc.

nemetroid2y ago

This is how we end up with https://accidentallyquadratic.tumblr.com/.

phito2y ago

Yep definitely No Hire.

GoblinSlayer2y ago

It's a google interviewer, he can have any hiring practice, he will still hire someone.

clnq2y ago

> No great engineer should ever settle for an O(n²) algorithm, unless bound by memory or some other unmovable constraint.

If we’re dealing with the latter, a small amount of data, and a one off report, I don’t care at all in my work whether an engineer I’m managing somehow writes it in O(n^3).

jauntywundrkind2y ago

There's places where writing bad shitty code doesn't matter but frankly I'd rather be at places and rather have colleagues and an environment where we don't write bad shitty code.

The attempts to circumstantially excuse away shitty answers goes against the desire to just not be shit.

0xb0565e4862y ago

I would argue that in most cases where performance isn’t a constraint, the first algorithm that comes to mind is probably the most optimal choice. He even says:

> About 80% of the candidates go for the naive solution first. It’s easiest and most natural.

The “naive solution” will be easier to understand and maintain. Why make it harder if it doesn’t add value?

stouset2y ago

The nearly optimal solution is effectively just as easy to understand and maintain and frankly should come even more naturally to an engineer than the O(n^2) version.

This takes all sorts of forms from people using quadratic time or quadratic memory to my personal favorite: pulling entire database tables across the network to do basic analytics.

1 more reply

PartiallyTyped2y ago

I used a brute force-y approach that meets the requirements, saves millions in operational costs (vs hiring engineers to build and maintain the complex non-brute-force solution).

Unfortunately people don’t think about actual engineering cost of “optimal” solutions. Engineering costs are part of operational costs and need to be juxtaposed against compute.

You can get a lot more mileage out of running the largest EC2 instance for a year vs hiring a junior engineer.

gpderetta2y ago

Why would only the non-bruteforce solution require hiring an engineer to maintain it? Does the brute force solution spontaneously manifest and maintains itself?

1 more reply

gpderetta2y ago

Even you are doing some one off analysis, having to wait several minutes can be annoying. And the non-quadratic solutions are not harder than the quadratic one.

xigoi2y ago

“Temporary” code rarely ends up actually being temporary.

Izkata2y ago

> Can’t I just use a Database?

ReactiveJelly2y ago

mplewis2y ago

Exactly. Use the right tool for the job here.

Foobar85682y ago

Excel and power query, bonus point, you can perform more advanced data analytics and industrialize for a folder easily

amluto2y ago

But a point lost because you may need to go out of your way to avoid hitting row limits. “I’m using a proprietary tool that has serious issues with more than 2^20 rows” is not so awesome.

rf152y ago

> I don’t like hard or tricky questions.

...

> I’ve actually expressed the problem in an ambiguous way.

So when other people do it, hard and tricky questions are bad, but when you deliberately set your candidate up for failure by withholding concrete information, that's clever and insightful. Got it.

mieubrisse2y ago

I don't see the author's withholding of all problem parameters as tricky at all, but an attempt to accurately mimic the ambiguities of the real world to see what the candidate does with it.

rf152y ago

PH95VuimJjqBqy2y ago

Your mistake is in thinking you can communicate so perfectly that they haven't decided they have the answer to the question from your description.

hoseja2y ago

I don't see it. Usually the bad tricks are CS puzzle gotchas, not a designed-in, well-spottable, management underspecification, which is actually testing for an everyday-used skill in the actual job.

arp2422y ago

Except it's not a "actual job" setting. It's an interview.

If you want to ask if anything could be clarified then ask.

1 more reply

rewmie2y ago

> So when other people do it, hard and tricky questions are bad, but when you deliberately set your candidate up for failure by withholding concrete information, that's clever and insightful. Got it.

To me the worst part is the nonsense rationale to argue away using a database to store and query this data. Taken from the article:

redditor986542y ago

I assure you, Carlos is not a “tech recruiter”.

1 more reply

Mawr2y ago

Here we see the most classic interviewing error: not understanding that there's a difference between a test and what's being tested.

Will the data fit in memory? Well, of course it will, it's an interview... oh you expected me to ask you anyway?

Guess I'm failing.

gpderetta2y ago

An interview could reasonably ask for an out-of-core solution.

But you are right that optimizing specifically for two files seems wrong, especially as the data contains a timestamp, so you could simplify the problem and ignore the number of files completely.

xoranth2y ago

You can generalize to

> Now, given two N log files we want to generate a list of ‘loyal customers’ that meet the criteria of: (a) they came on ALL days, and (b) they visited at least L unique pages.

while keeping linear time complexity and

  O(num records in first file * L)

memory complexity. It is not too different from the solution given in the article (just use 3 maps instead of two).

That means that multiple files doesn't need out-of-core if the maps for one file at a time fit memory.

qsort2y ago

"explain select" is a cool source of interview questions :)

dopidopHN2y ago

Thanks. My reasoning was indeed “alright, it’s a weird world without DB, what a DB do? Hash-join.

AstralStorm2y ago

A better db would keep an online running window in a bloom filter for it, though. (A good implementation of an index.) Not even mentioned in the post.

interactivecode2y ago

extragood2y ago

I have a similar feeling.

My team builds out a lot of one-off projects for customers, and a lot of them don't need to be especially performant. That has always been the case for reporting/analytical projects.

michaelteter2y ago

And my favorite answer to questions like this is, "Can I just use grep (or shell commands in general)?"

gpderetta2y ago

You would definitely pass if I interviewed you, as long as your solution was reasonably efficient.

I would love to interview a candidate that can show they can use command line tools effectively.

optymizer2y ago

Because shell command injection and process spawning time are not always acceptable side-effects.

michaelteter2y ago

kasdi2y ago

Out of curiosity, how would you solve this particular problem with shell commands?

CSMastermind2y ago

Reading that made me happy I'm not an IC anymore.

Don't get me wrong - it's a totally fair question, frankly one I would have been happy to receive when I was interviewing for those roles.

I'm also a fan of whiteboarding coding interviews in general as a way of evaluating talent so no objections there.

SkyPuncher2y ago

I also cannot stand the context your given this in. It's setup as a one-off task. Who cares how fast it runs? Is it correct?

johnnyanmac2y ago

I guess it's one of those cool brain teasers that gets you excited to use your skills from college. Not many get to in reality. Or they prefer other domain-specific skills.

johnfn2y ago

A cleaner solution would be to load both days 1 and 2 into the same hashmap. Then you can iterate the map and count whatever condition you want.

tobiasSoftware2y ago

Terr_2y ago

> I agree. I hate how these interviews always focus so much on algorithmic time at the expense of flexibility of the code.

Especially when the question scenario is generating "business metrics" which tend to see a lot of tweaking and iteration.

Having engineers who make contextually appropriate designs and architectures is at least as important as having engineers who are math-whizzes.

eachro2y ago

Terr_2y ago

I somehow ended up responsible for the coding interview part at a small startup, and I'm pleased to say that it doesn't involve any niche algorithm memorization or specific language knowledge.

It might not be testing every possible skill the applicant has, but at least it's in-line with one of the tasks we actually expect them to perform regularly.

flashgordon2y ago

"Let us hire this person because they dint solve the problem but were a great conversationalist problem solver" - said nobody at a standardized hiring committee!

sjducb2y ago

tmtvl2y ago

Story of my life. I think people just take pity on me (it's better than sitting on the side of the road with my cap on the ground).

flashgordon2y ago

throwaway815232y ago

It wouldn't surprise me if the O(n log n) sorting solution is faster than the O(n) hashing solution, because of better memory locality.

ghthor2y ago

How much memory does the sort command end up using; 2N?

throwaway815232y ago

I've used the Unix sort program on inputs as large as 500GB and it works fine with a few GB of memory. It does take a while, but so what.

anotherpaulg2y ago

For fun, I fed this interview question to GPT-4 with aider. See the chat transcript linked below.

The data structures look sensible and it did most of what the interviewer wanted on the first try.

It did make the wrong initial assumption that we wanted 2 unique pages per day. When prompted with a clarification, it made a sensible fix.

When asked to optimize, it went for big hammers like parallel processing and caching. As opposed to saving memory by only storing one file in the data structure as the author discussed.

https://aider.chat/share/?mdurl=https://gist.github.com/paul...

greatgib2y ago

What is very is funny is that, in all big and small companies I have worked in or with, no one ever use or discussed BigO. It only shows up in interviews technical tests.

Lots of good software engineers will have the instinctive knowledge of good or costly solutions without mapping it to BigO.

PH95VuimJjqBqy2y ago

> What is very is funny is that, in all big and small companies I have worked in or with, no one ever use or discussed BigO.

Maybe you don't work in the right places then.

rf152y ago

sfn422y ago

traverseda2y ago

>Poor candidates load the contents of both files into memory.

>Or, if you’ve already determined that a customer is loyal, you don’t need to waste CPU cycles going thru the logic again next time you encounter that customer in Day 2.

If this developer started doing stuff like that in an interview with me, well it would raise some red flags.

>If you want to make the problem even more interesting, you can add a third file. I will leave that as an exercise for the reader as well!

See, our imaginary customers imaginary minds did end up changing. Bet you wish you had loaded both files into memory now.

Niksko2y ago

It's a good question, and the author explains it and the logic really well.

As someone going through this style of interview at the moment (but not having interviewed at Google, Microsoft or Amazon), two things jump out at me:

AndyNemmity2y ago

... I'm guessing I didn't get the job.

avmich2y ago

josephg2y ago

> Map<CustomerId, Set<PageId>> will do

You can do a little better than that. Each item in your map has exactly 3 states:

- We’ve seen this customer visit one unique page with (xx) url on the first day

- We’ve seen this customer visit two unique pages - but only on the first day.

- We’ve seen the customer visit one unique page (xx) and they’ve visited on both days.

It’s a great problem to model using a state machine.

gpderetta2y ago

It is mentioned later that you only need to list at most one URL .

croes2y ago

Did any of the hired programmers for Amazon work at the Amazon search?

I don't mind O(n²) if I get good result but Amazon's search seldom gives me good results.

Same with Microsoft's search in the start menu. Doesn't find Excel if I type "exc".

SkyPuncher2y ago

IMO, this interview question is going to get you amazing developer who fail to build anything of value.

-----

There's a place for performance, but the fast running turd is still a turd.

josephg2y ago

> I'm really amazed that this "best interview" question really just boils down to leetcode for a _Senior Staff_ level interview.

stouset2y ago

Or JIRA, which is particularly inexcusable.

You would think that an exact literal username match would have priority, but no. Typing any prefix of my name similarly sorts everyone else before me too.

fmajid2y ago

RheingoldRiver2y ago

> Same with Microsoft's search in the start menu. Doesn't find Excel if I type "exc".

What do you get? I tried just now, and `E` gives me Excel. In fact, I typed "Esc" first on accident, and I got something different for "Es" but "Esc" gave me Excel too.

croes2y ago

Sometimes I get nothing until I type "Excel", sometimes I get suggestions for a web search.

1 more reply

RagnarD2y ago

ZoomZoomZoom2y ago

Great to see I'm not the only one here who isn't fixated on 'how' so strongly that they've stopped asking 'why'!

bjornlouser2y ago

pjot2y ago

I think these kinds of problems cause interviewee’s (under stress) to overthink the solution.

“Load to a relational store and use sql” would be a reasonable answer that, I’m sure, would be acceptable in most cases.

gpderetta2y ago

It would if I was the interviewer. But expect that the followup question would be how the SQL engine implements the query.

AndyNemmity2y ago

How isn't remotely relevant to being able to accomplish the task.

But I haven't done that in 7 years, and couldn't tell you much more than the tools used to figure it out.

I cannot remotely understand the requirements people make up for our jobs that have nothing to do with doing our jobs.

j-pb2y ago

  Sorting the files can be a logarithmic operation with constant memory.

A great candidate would know that this is a log-linear operation O(n*log(n)), not a logarithmic one O(log(n).

donatj2y ago

I mean in my book anyone doing this in anything other than a bash one liner leaning on Awk is overbuilding.

antisthenes2y ago

2 liner SQL query and forget about it.

missblit2y ago

The rationale for not wanting SQL solution feels a bit strained. If you don't want an SQL answer just say "Hey buster this is a C++ interview!"

krackers2y ago

Good thing SQLite is written in C.

yardstick2y ago

How about a single map to a structure that contains a bitmap for days visited, and a bloom filter for pages visited.

Very low memory and compute requirements.

Map<CustomerId, Metadata>

Where Metadata is a dateVisited bitset plus a bloom filter (32/64 bits depends on how accurate you need it to be).

antisthenes2y ago

I agree with most points in the article except for the database part.

Why solve this with 20 lines of code, when you can solve it in a 4 line SQL query?

phendrenad22y ago

mattkenefick2y ago

Job interviews like this are terrible. It's a shame he's put so many people through it.

happytiger2y ago

It creates a team of people who are good at tests, and good in testing environments.

However, there are so many things that make an engineer great that have nothing to do with how they solve problems but who they are, how dedicated to improvement they are, etc.

But they may not be someone who:

The list of reasons that “whiteboard testing interviews” don’t work well is long.

Some of the very best engineers I have ever hired would never make it though this interview. But they were amazing engineers who did world class work.

shepherdjerred2y ago

This article mentions "good" and "great" candidates many times.

Great candidates aren't those who can answer DS & algorithms questions the best, but it seems that the author thinks that way.

amluto2y ago

So the data structure can be:

map<customerid, (MultiplePages | pageid)>

Where MultiplePages is choice in a sum type that doesn’t store any associated data. Or you can do:

map<customerid, optional<pageid>>

Where the none state of the optional means there are multiple pages, but this is a bit of an odd use of an optional.

zdw2y ago

You could get to something working and relatively bug free in 15 minutes with a shell script consisting of little more than cut, head, sort -u, and grep.

For reference: http://www.leancrew.com/all-this/2011/12/more-shell-less-egg...

kristopolous2y ago

Sorting, merging, then just a clever unique was basically the only thing I'd consider.

If I used this question I'd add to it.

The point is to ask for actual creative solutions instead of the pattern recognition problems that most of these problem formats are.

You want something that a weekend of drilling won't change the result of

Izkata2y ago

Also possibly comm for combining two lists (one for criteria (a), another for criteria (b), then intersect them with comm).

corethree2y ago

Easy.

3. Filter the map for all customers with length(Set[Page]) > 1. To get the Set of all customers that visited more than one page. O(N)

4. Combine the sets of customers who visited multiple pages with customers that appeared in both log files into a new set. O(N). You can do this with, again the union operator in python.

The irony is I'm more able to do this in python then I am in SQL. For SQL stuff at work I just look it up. I never remember the exact syntax. Total runtime O(N) total memory O(N)

This is just your basic hashmap stuff for fast look up and aggregation. Usually fang questions are harder than this.

bvrmn2y ago

union -> intersection

corethree2y ago

Thanks for the correction.

andrewstuart2y ago

If the job is log file processing, it's an entirely reasonable question.

But most jobs are not log file processing.

It's ridiculous to generalise this sort of thing:

"We are cooking soup here at Amazon Soup Kitchen. My favorite interview question is to ask candidates to bake a cake, that's the real test of any cook."

marssaxman2y ago

sfn422y ago

chiefalchemist2y ago

sneed_chucker2y ago

Frankly, I don't want to hire people who are too timid to ask a simple clarifying question in an interview.

davidw2y ago

Then make it clear that it's "real world" and that they can and should ask questions about aspects of it that they need more information on, rather than just hoping they do so.

1 more reply

chiefalchemist2y ago

You're assuming the nervous stressed candidate can read your mind? That's not going to yield the best candidates. It's going to yield the candidates who are best "tuned" for the process.

I've already explained simple and obvious scenarios where the context can impact the candidates in sub-optimal ways.

1 more reply

two_handfuls2y ago

phanimahesh2y ago

I have practically never seen a prd that did not have some ambiguity. I realised spotting ambiguity and asking questions is an essential and invaluable skill that needs to be selected for.

Whether spotting ambiguities in the question has any correlation with future performance is harder to answer, but methodical people with attention to detail are preferrable to the alternative.

davidw2y ago

I agree with this... you're asked to do something and the interviewer is purposefully holding back information wanting you to come out and ask about it. That feels a bit tricky to me.

chiefalchemist2y ago

It's certainly not transparency.

"We can't find good candidates"? Nah. Your hiring process sucks. Get a mirror.

acheron2y ago

Ugh yes, the interviewers who give ambiguous questions and expect you to read their mind are the worst.

What’s the old xkcd: “communicating poorly then acting smug when you’re misunderstood is not cleverness.”

klodolph2y ago

The skill under test is for that part is “can solve ambiguous problems”, and what you want to see is that the candidate is able to recognize that a problem is ambiguous.

I have gotten lots of ambiguous problems during my career, it seems only fair to have them appear during an interview.

philwelch2y ago

He’s not expecting you to read his mind; he’s expecting you to notice the ambiguity and ask a clarifying question.

1 more reply

laurent_du2y ago

tobiasSoftware2y ago

coolThingsFirst2y ago

The problem is that once you start asking algorithm questions for top tech companies people will optimize for knowing them deeply instead of exploring other fields of CS.

Once a metric becomes the goal it ceases to be a good metric.

seanmcdirmid2y ago

https://en.m.wikipedia.org/wiki/Goodhart%27s_law

svilen_dobrev2y ago

Nothing about O(blah) though, these are way-too specialized / optimizing lands.

that said, coding != thinking..

And yes, having an (subtly) unclear requirement (that should be identified and asked about), is important part of the learning - that's 90% of cases in life.

animal5312y ago

Personally I prefer something like fizzbuzz which is a pure code question, it applies to candidates of all levels and tells you if they can reason through problems.

gpderetta2y ago

In this specific case it heavily favor anybody that has spent more than 5 minutes with a programming language and used a dictionary.

1 more reply

ZoomZoomZoom2y ago

dooglius2y ago

I think unaligned customers are discriminated in favor of--more likely to be browsing around the server's midnight and get marked as visiting two days.

ZoomZoomZoom2y ago

1 more reply

two_handfuls2y ago

Nice question. Reminds me of the famous saying: “never underestimate the power of sorting.”

gpderetta2y ago

ozim2y ago

Love how he goes, he does not like tricky questions and first thing he describes is a trick hidden in the question designed specifically to fool candidates.

Yet he still thinks it is not a tricky question.

But great article and I learned something from it.

twelve402y ago

ozim2y ago

I don't think solution for that is dropping perfectly capable people on the interview.

pharmakom2y ago

Just chuck it in SQLite and move onto the next business problem. 20 mins, tops.

rakoo2y ago

Here's my reasoning for the problem: we have 2 files, one for each day, and we want to see who came both days on different pages. That's a job for join, with some sort|uniq in the middle.

- for each page, we want unique CustomerId -> PageId "mappings", but in UNIX land that's just 2 columns on the same row:

    cat dayX | awk '{print $3 $2}' | sort | uniq

- now I have two lists, I join them

    join day1_uniq day2_uniq

this gives me, for each customer, its id, then all its pages, on the same line. Customers who came only on one day are not in the output.

    cat both_days | awk '{for (i = 1; i < NF; i++) {pages[$i] = 1; if (length(pages) == 2) {print; next}} }'