E.g. sure, when replays work, they're great, but:
a) you have to do a manual recording to create them the first time (which means manually setting up test data in production that is just-right for your test case)
b) you have to manually re-record when they fail (and again means you have manually go back and restore the test data in production that is just-right for your test case...and odds are you weren't the one who originally recorded this test, so good luck guessing exactly what that was).
In both cases, you've accepted that you can't easily setup specific, isolated test data in your upstream system, and are really just doing "slightly faster manual testing".
So, IMO, you should focus on solving the core issue: the uncontrollable upstream system.
Or, if you can't, decouple all of your automated tests from it fully, and just accept that cross-system tests against a datasource you can't control is not a fun/good way to write more than a handful of tests (e.g. ~5 smoke tests are fine, but ~100s of record/replay tests for minute boundary cases sounds terrible).
Your test invokes the the recorder. There isn't anything manual outside of writing & running your test.
> you have to manually re-record when they fail
Again, nothing manual. It would require running your test again with Polly in record mode if you want to "refresh" the recording with a newer set of responses.
> In both cases, you've accepted that you can't easily setup specific, isolated test data in your upstream system, and are really just doing "slightly faster manual testing".
This is by no means a replacement to E2E testing. It is a form of acceptance/integration testing where you're testing your application against a point in time that you verified all systems were talking correctly with your application. E2E tests are much slower, difficult to debug, and intended to capture those breakages in contracts.
It's a tool for your toolbox, reach for it when needed. We plan to release a tutorial/talk to should clear up any misconceptions. There are also other applications for Polly such as building features offline or giving a demo using faker to easily hide any confidential data.
Sure, apologies for being negative about a tool you've worked on and are rightly proud of. I'm sure you already have more users than any open source project I've ever written. :-)
I struggle a bit at this point in my career, as I've made enough mistakes and seen enough mistakes, that I generally have strong gut opinions on "yeah, that's probably not going to work/scale/etc."
So, when observing new developers/teams starting to "make a mistake" that I've seen before, my gut says "no! bad idea!"...but I know I could be wrong, so it's tempting to say "well, sure, that didn't work for us, but go ahead and try again".
Because, who knows, maybe eventually someone will figure out an innovation that makes a previously-bad approach now tenable, and even best-practice.
But, realistically, that rarely happens, and so teams, orgs, the industry as a whole stumbles around re-making the same mistakes, and codebases/teams/etc. pay the cost.
I've thought a lot about micro-service testing at scale:
http://www.draconianoverlord.com/2018/01/21/microserving-tes...
Basically there are no easy answers, short of some sort of huge, magical, up-front investment in testing infra that only someone like a top-5/top-10 tech company has the eng resources to do.
So, definitely appreciate needing to do "something else" in the mean time. ...record/replay is just not a "something else" I would go with. :-)
Yes, sorry for being inexact/overusing the term--I understand the tests drive the recording.
What I meant by manual is getting the e2e system into your test's initial state.
E.g. tests are invariably "world looks like X", "system under test does Y", "world looks like Z".
In record/replay, "world looks like X" is not coded, isolated, documented in your test, and is instead implicit in "whatever the upstream system looked like when I hit record".
Which is almost always "the developer manually clicked around a test account to make it look like X".
This is basically a giant global variable that will change, and come back to haunt you when recordings fail, b/c you have to a) re-divine what "world looks like X" was for this test, and then b) manually restore the upstream system to that state.
If no one has touched the upstream test data for this specific test case, you're good, but when you get into ~10s/100s of test, it's tempting to share test accounts, someone accidentally changes it, or else you're testing mutations and your test explicitly changes it (so need to undo the mutation to re-record), or you wrote the test 2 years ago and the upstream system aged off your data.
All of these lead to manually clicking around to re-setup "world looks like X", so yes, that is what I should have limited the "manual" term to.
In these case I like to have an adaptor layer and use a recording framework to "test" the adaptor. That way I can occasionally rerecord my scenarios and be notified if something important has changed. Normally what happens is that my service stops working for some unknown reason. I rerecord the adaptor scenarios and usually the reason pops out very quickly. All the rest of my code is coded against the adaptor and I stub it out in their tests (which I can do reasonably well because I control it).
That makes sense.
Ideally protocols are declarative/documented/typed, e.g. Swagger/GRPC, so you can be more trusting and not need these, but often in REST+JSON that does not happen.
> All the rest of my code is coded against the adaptor
Nice, I like (the majority?) of tests being isolated via that abstraction.
Although, if "the protocol" is basically "all of the JSON API calls my webapp makes to the backend REST services", at that point do you end up adapter scenarios (record/replay recordings) for basically every use case anyway? Or do you limit it to only a few endpoints or only a few primary operations?
A better alternative IMO is to craft a list of resources in JSON, then use this data in a fake REST server that takes over fetch and XHR in the browser.
Something like:
{ posts: [{ id: 1, title: "foo" }, { id: 2, title: "bar"}], comments: [{ id: 1, post_id: 1, body : "lorem ipsum" }] }
Incidentally, that's the way [FakeRest](https://github.com/marmelab/FakeRest) has been working for years (disclaimer: I'm the author of this OSS package).I think recording tools can be a sharp tool, and require care, but (as a starting point), if you have an automated library that can generate recorded fixtures in a repeatable, automated fashion, you can eliminate a lot of the pain points while still reaping all of the benefits. That's how we set it up where I am - responses and fixtures are generated as part of a full suite execution, but persist with individual test runs.
Especially at large corps like Netflix I'm sure there's a lot of hoops to jump through.
I looked through the codebase, and noticed that this uses a custom data format to persist HTTP requests and responses in local storage. I'm not sure if it's technically possible in all circumstances, but I think it might be valuable to have requests and responses be stored as HAR 1.2 [1] when possible, so that the trace can be used by other tools [2] to aid in debugging, verifying and analyzing behaviour as well as perhaps automated creation of load/performance tests.
[1] - http://www.softwareishard.com/blog/har-12-spec/
[2] - e.g. https://toolbox.googleapps.com/apps/har_analyzer/
I like the API of this library and the browser support that was missing in nock. So thanks Netflix! Although it would have been nice to see nock add this support. Which is what I wonder - why not just contribute to existing libraries.
It lets you create & configure mock HTTP servers for JS testing, but with one API that works out of the box in Node and in browsers. This avoids the record/replay model too, so you can still take the TDD approach and define your API behaviour in the test itself.
(Disclaimer: I'm the author of Mockttp)
I just want an ability to save and reopen exactly what I'm looking at. There are some cool websites which will eventually go down and I want to preserve an interactive snapshot of them.
However on smaller projects I've found that just clicking through to make sure things work and then letting my error reporting system catch bugs to be much more effective :)
It's a hard line to walk and I surely haven't perfected it. I'll give it a shot on a future project!
Why Polly?
Keeping fixtures and factories in parity with your APIs can be a time consuming process. Polly alleviates this by recording and maintaining actual server responses without foregoing flexibility.
* Record your test suite's HTTP interactions and replay them during future test runs for fast, deterministic, accurate tests. * Use Polly's client-side server to modify or intercept requests and responses to simulate different application states (e.g. loading, error, etc.).
EDIT: I am aware there are many other tools that can address this, we just haven't had the time yet to implement them. :)