The new browser-based puppeteer REPL, and how we built it (opens in new tab)

(docs.browserless.io)

91 pointsmrskitch5y ago30 comments

30 comments

I've used Puppeteer extensively and its the most unstable tool that I've come across in recent years. Its full of race conditions and the waitForSomething functions will sometimes work, sometimes not.

It's usable, but it's very aggravating and uncomfortable to use.

breatheoften5y ago

Surprising to me -- I used puppeteer last year for a couple of projects and found it to be extremely pleasant to use -- almost fun even! You can just pick it up and most of the api seemed relatively straightforward. I found it a little annoying to get things like reliable window placement (yes I do want the chrome incognito to be in this exact place and size everytime).

But it seemed very solid overall and especially so compared to the selenium monstrosities I remember trying to fight (and giving up) some years prior ...

KMag5y ago

The GP is describing lots of timing-related issues. It's possible you experience much lower latency than the GP.

I've never used Puppeteer, but having used a lot of US-hosted web services from Asia, I've seen plenty of latency-sensitive bugs (or at least annoyances not present for NYC-based users).

breatheoften5y ago

There are all sorts of annoyances that can manifest differently in different parts of the world when doing various forms of browser automation from different parts of the world.

Different cdn behaviors and differences in the way bot detection logic gets triggered due to geoip ip categorization lists (which definitely impact a lot of sites, often in subtle ways when - say - only a subset of the javascript assets on a page fail to load).

In my experience though, these observations are just as likely to be programmer error as they are to be legitimate claims that 'my code doesn't work because of timing issues in the puppeteer implementation.' The programming model that puppeteer exposes tends toward the rational in my opinion.

Browser automation often involves asynchronously instructing the page to simulate a click on something in order to induce some javascript in a page to run some effect, in order to repaint the dom -- and this kind of logic chain will _always_ demonstrate timing variation. But the puppeteer api's make it pretty straightforward to sequence the events that are simulated so that you can ensure you issue a click on something that exists or a waitForSomethingToExist polling based detector that will trigger reliably when a given dom node appears ... Depending on the specific page's implementation, sometimes this kind of logic can be harder to write for one page than another -- but my first guess for the reason why a given page might make this kind of logic flakey or hard to implement is more likely to be an obscure implementation detail of the page than a puppeteer bug (as a first approximation).

1 more reply

bryanrasmussen5y ago

It might be that puppeteer is more pleasant to use for testing than other activities, like for example automated scraping in which it seems more annoying than using something like scrapy.

jonperl5y ago

Have you tried Playwright? It is a fork of Puppeteer that adds auto-waiting

haolez5y ago

No! Thanks for pointing it out.

notsureaboutpg5y ago

Playwright is the Microsoft one yeah?

That's the best one. Of all the browser automators, playwright is the best. Never had a wait fail or anything like that. maybe I just got lucky, but if you're looking to do something with browser automation, try Playwright first, then look elsewhere

mandelbrotwurst5y ago

In what sorts of ways would they fail?

Was it definitely not that there were inconsistencies in the pages that you were interacting with?

Curious as someone who has done some browser automation, but not in a while and never with Puppeteer.

mrskitchOP5y ago

Hey, Joel here, mostly responsible for this tool. Happy to answer questions — one thing not well covered is getting puppeteer to run in the browser, especially a webworker. Can talk more about it if there’s interest!

andrenotgiant5y ago

This is magical and cool.

As a web dev generalist, I can usually understand how most things work under the hood.

But playing with chrome.browserless.io breaks that. You're streaming the web page in a <canvas> element, but how can I highlight text? When I load a youtube video page are you literally proxying the video through your infra, through <canvas> pixels to my browser?

Who dictates what IP the headless chrome is assigned to? Do you have a lot of IPs? I noticed on some pages I'd get the CloudFlare captcha which makes sense if browserless has to cycle through a limited set of IPs where other people have been using it to scrape another cloudflare page.

mrskitchOP5y ago

Yup, There’s a lot going on here. Currently the tool uses a fixed IP for the running browser. That’s why you’re seeing that Cloudflare issue.

As far as the hovering goes, the canvas element is “mirroring” interactions back through to the underlying page. When Devtools are active, this triggers chromium to render hover effects in its GUI. This then gets sent back to the canvas element in the debugging page.

It’s a lot of network traffic and Synchronization... but once everything is setup it works fairly seamlessly

f4305y ago

can you explain how this is architected? you are running a browser on a server and its streaming the video to the canvas? where does the cloudflare bit come in?

very interesting project!

1 more reply

mathfailure5y ago

How to set up my own mirror/instance of chrome.browserless.io? Is there a step-by-step guide for it? The git repo just mentions:

> The application is written in TypeScript, and produces a static asset in the static directory once built.

What should I do with said artifact? How to put it to use?

mrskitchOP5y ago

You can actually install our docker image or use the npm module:

https://github.com/browserless/chrome/blob/5627f1ef041ec23f3...

panabee5y ago

could you share why you chose puppeteer over playwright? thanks!

mrskitchOP5y ago

Just familiarity with puppeteer, eventually this tool will support both. Puppeteer has less “moving parts” for now

celerity5y ago

Nice! We built something like this 3 years ago while I was still at Intoli. Instead of Puppeteer we used our own Web Extensions API based framework called Remote Browser [1], the core of which was written by my cofounder.

The tour is still up at [2]. The servers that actually run the Remote Browser have since gone down, but interestingly you can still run the tour. That's because if you don't change the code in the REPL window, you get cached results (except step 7/7 which scrapes Hacker News and won't work). To get those results, we built a little tour "recorder" that would be run on every release. If I remember correctly, we allowed some dynamic ES6 imports through a custom Babel compiler for the code that's input, which also allows first level async stuff, which still works :)

[1]: https://github.com/intoli/remote-browser [2]: https://intoli.com/tour/

nicwhittle5y ago

Could you elaborate a little on ‘puppeteer in the browser’?

Is puppeteer running on a webserver then the repl connecting to it? Or is puppeteer completely contained within each users browser?

mrskitchOP5y ago

Sure! Puppeteer is a node-based library, and pretty much all the web-apps out there that let you run puppeteer code do it in an elaborate node sandbox. This tool gets around that by running puppeteer in your own browser, making it a lot faster and more secure

nicwhittle5y ago

It looks like a really cool project. Great job!

I’m curious if it’s possible to proxy the network requests so for example it would use the browsers IP address instead of the server?

slig5y ago

Hey, I have a question. On your home page it says that the usage-based plan doesn't offer the Live Debug, how does one debug scripts on that plan?

mrskitchOP5y ago

You can use our demo debugger (the address is in this blogpost). It might not match your version of puppeteer exactly, but it’s a close enough proximate that it’s still valuable

httgp5y ago

Does this run Puppeteer on Lambda/Docker? One thing that I’m struggling to do is run Chromium on AWS Lambda as the new container image.

fulafel5y ago

Is the editor pane just for javascript? Or does it support some or all languages that compile to JS?

mrskitchOP5y ago

Just JS for now since it compiles and runs entirely in the browser

fulafel5y ago

Sounds reasonable.

Many languages are self hosting (eg ClojureScript) so this is not a showstopper :)

j / k navigate · click thread line to collapse

30 comments

haolez5y ago

It's usable, but it's very aggravating and uncomfortable to use.

breatheoften5y ago

But it seemed very solid overall and especially so compared to the selenium monstrosities I remember trying to fight (and giving up) some years prior ...

KMag5y ago

The GP is describing lots of timing-related issues. It's possible you experience much lower latency than the GP.

I've never used Puppeteer, but having used a lot of US-hosted web services from Asia, I've seen plenty of latency-sensitive bugs (or at least annoyances not present for NYC-based users).

breatheoften5y ago

There are all sorts of annoyances that can manifest differently in different parts of the world when doing various forms of browser automation from different parts of the world.

1 more reply

bryanrasmussen5y ago

It might be that puppeteer is more pleasant to use for testing than other activities, like for example automated scraping in which it seems more annoying than using something like scrapy.

jonperl5y ago

Have you tried Playwright? It is a fork of Puppeteer that adds auto-waiting

haolez5y ago

No! Thanks for pointing it out.

notsureaboutpg5y ago

Playwright is the Microsoft one yeah?

mandelbrotwurst5y ago

In what sorts of ways would they fail?

Was it definitely not that there were inconsistencies in the pages that you were interacting with?

Curious as someone who has done some browser automation, but not in a while and never with Puppeteer.

mrskitchOP5y ago

andrenotgiant5y ago

This is magical and cool.

As a web dev generalist, I can usually understand how most things work under the hood.

mrskitchOP5y ago

Yup, There’s a lot going on here. Currently the tool uses a fixed IP for the running browser. That’s why you’re seeing that Cloudflare issue.

It’s a lot of network traffic and Synchronization... but once everything is setup it works fairly seamlessly

f4305y ago

can you explain how this is architected? you are running a browser on a server and its streaming the video to the canvas? where does the cloudflare bit come in?

very interesting project!

1 more reply

mathfailure5y ago

How to set up my own mirror/instance of chrome.browserless.io? Is there a step-by-step guide for it? The git repo just mentions:

> The application is written in TypeScript, and produces a static asset in the static directory once built.

What should I do with said artifact? How to put it to use?

mrskitchOP5y ago

You can actually install our docker image or use the npm module:

https://github.com/browserless/chrome/blob/5627f1ef041ec23f3...

panabee5y ago

could you share why you chose puppeteer over playwright? thanks!

mrskitchOP5y ago

Just familiarity with puppeteer, eventually this tool will support both. Puppeteer has less “moving parts” for now

celerity5y ago

[1]: https://github.com/intoli/remote-browser [2]: https://intoli.com/tour/

nicwhittle5y ago

Could you elaborate a little on ‘puppeteer in the browser’?

Is puppeteer running on a webserver then the repl connecting to it? Or is puppeteer completely contained within each users browser?

mrskitchOP5y ago

nicwhittle5y ago

It looks like a really cool project. Great job!

I’m curious if it’s possible to proxy the network requests so for example it would use the browsers IP address instead of the server?

slig5y ago

Hey, I have a question. On your home page it says that the usage-based plan doesn't offer the Live Debug, how does one debug scripts on that plan?

mrskitchOP5y ago

You can use our demo debugger (the address is in this blogpost). It might not match your version of puppeteer exactly, but it’s a close enough proximate that it’s still valuable

httgp5y ago

Does this run Puppeteer on Lambda/Docker? One thing that I’m struggling to do is run Chromium on AWS Lambda as the new container image.

fulafel5y ago

Is the editor pane just for javascript? Or does it support some or all languages that compile to JS?

mrskitchOP5y ago

Just JS for now since it compiles and runs entirely in the browser

fulafel5y ago

Sounds reasonable.

Many languages are self hosting (eg ClojureScript) so this is not a showstopper :)

j / k navigate · click thread line to collapse