My understanding is that Gatsby is a tool that converts a bunch of markdown files into a static HTML website. Why is slow builds a problem for any static site generator? Why does it need a cloud?
In other words, what problem am I supposed to be having that any of this solves?
Note, I'm trying not to be skeptical here - my company's website is hand-maintained HTML with a bunch of PHP mixed in so I can totally imagine that things may be better. But I don't understand the kinds of situations where using a 3rd party cloud to generate some static HTML solves a problem.
This process is fairly fast on small/simple sites. Gatsby is overall very efficient and can render out thousands of pages drawing from large data sources rather quickly. The issue is that Gatsby isn't just used for personal blogs. As you can imagine, a site with thousands of pages of content that is processing thousands of images for optimization starts taking a long time to build (and a lot of resources). For example, I'm building a Gatsby site for a photographer than includes 16000+ photos totaling a few hundred GB. Without incremental builds, any change (e.g. fixing a typo) means every single page needs to be rebuilt.
Incremental builds means you don't have to rebuild everything. Because the data is all coming from the GraphQL (which Gatsby pre-processes and converts to static JSON), it is possible to diff the graphs (i.e. determine what data a commit has changed) and determine what pages it affects (i.e. which pages include queries that access that field). From there, Gatsby can only rebuild that changed pages.
This not only means faster build times, it also means that only the changed pages and assets have to be re-pushed to your CDN. This way, content that hasn't changed will remain cached and only modified pages will have to be sent down to your site's users.
Gatsby (and other frameworks) automate this process by going through whatever data sources you have (directory of markdown files, databases, etc) and producing the HTML. Gatsby uses React for the templating logic and any client-side interactivity on the pages. Build times scale with the size of your content and number of pages to generate so that's the reason for the cloud.
Overall, static sites are in the hype phase of the software cycle. Most sites are just fine using Wordpress or some other CMS and putting a CDN in front to cache every pageview. Removing that server completely is nice but most static sites end up using some hosted CMS anyway and at that point you just replaced one component for another. There's also advantages to completely separating the frontend code from the backend system for fancy designs or large teams.
1) There's a server-centric approach and a client-centric approach:
--a) hand-maintained HTML + php falls into the first camp
--b) React (/Angular/Vue) fall into the second
2) If you go with the second camp (b), you end up having a higher initial page load time (due to pulling in the whole "single page app" experience), but a great time transitioning to "other pages" (really just showing different DIVs in the DOM)
3) Gatsby does some very clever things under the hood, to make it so that you get all the benefits of the second camp, without virtually any downsides.
4) There are of course all kinds of clever code-splitting, routing & pre-loading things Gatsby does, but I hope I got the general gist right.
If not, Kyle, get the nerf gun out! -- how would you describe the Gatsby (& static sitegen) benefits? :)
(2) is both overstated and overvalued. It's overstated because loading a static HTML page from a CDN is extremely fast. Too many people who point at this advantage for SPAs are thinking back to pre-CDN usage with slow origin servers. Of course there are still use-cases where going to network is not wanted, but these aren't the primary use-cases that Gatsby covers.
It's also overvalued in that most users are not getting to a page by navigating in a loaded site, they are coming from a social or search link (again, for the sort of use-cases that Gatsby pages are built for).
I love it because I vastly prefer serving static assets to server-side rendering because of the numerous simplicities it provides (aggressive caching, predictable latency, etc). In most cases you get to have the cake of complex sites generated from template and eat the cake of static asset serving.
It saves time, especially for larger pages, because instead of rebuilding the entire site with all its pages, you just rebuild those that change.
If it were just Markdown files you probably wouldn't need this since parsing and transforming local Markdown files it fast. But this is Javascript, so nothing is truly fast.
This is not really true; they often generate a static client-side web application vs. a dynamic first-time (or every time) app based on server-side processing. This provides a highly optimized, largely self-contained application that avoids a lot of the runtime dependencies and complexity we typically get (ex: web servers and databases). They are still highly dynamic through the use of APIs and such.
Gatsby has an extensive build pipeline and can query almost any data source during the build, but the original base source is markdown, and react is the Javascript.
> [Slow builds] can be annoying if your site has 1,000 pages and one content editor. But if you have say 100,000 pages and a dozen content contributors constantly triggering new builds it becomes just straight up impossible.
Gatsby needs a cloud to host this build server. They also apparently host a nice content editing UI.
If you don't need a content editing UI, and/or are fine maintaining your own static builds, you presumably wouldn't subscribe to the cloud service.
Here's a talk form one of the creators: https://www.youtube.com/watch?v=EpYYe6aQjJM
Really appreciate the feedback and support for our launch today! The team worked super hard to get Incremental Builds live in public beta but are taking all the feedback (here and all over the web) as we go into full launch. Let us know what you think. Thanks!
Just read the post, congrats on the launch!
We've been using Gatsby on:
for the past few years, and are huge fans of your work.
I still recall the day when I brought Gatsby into our org, our front-end guys almost ate me alive :D
They said: a React.render(...) + GraphQL thing, why do we need it? What's the big deal?
Fast forward a few years later, and Gatsby dominates (in my opinion) the best way to build a static website based on React.
Keep up the awesome work!
Your true fan, Denis
I know you guys are working towards SSR as well, but do you see a particular point of convergence between what you're doing and Nextjs.
Because it seems that given Nextjs SSR, SSG and everything else working now...Gatsby will get to where Nextjs is today.
Here are some thoughts on my experience with Gatsby:
- You've done a lot of work to make configuring Gatsby easier, but I still seem to constantly hit roadblocks trying to get the config I want. For example I was running into problems getting MermaidJS, embedded video (that I was hosting on my own machine, not on YouTube), and mdx files all working together.
- I've been thinking that Gatsby is the perfect framework for creating semantic web content. E.g., you could have calendar events sprinkled through a website and create a GraphQL API for listing those calendar events, and that API would be accessible during the build process.
We built https://mintdata.com/docs on it, and it's been proverbially better than sliced bread -- that is, a true joy to work with MDX.
What're the challenges/pitfalls you faced with MDX + Gatsby?
I get that Gatsby company put a lot of effort into this and wants a return on that investment, and good for them. I assume a third party could offer the same but why would they compete at the same value prop.
However an open source version to not be reliant on any company would be compelling to many.
To reliably provide near real-time deployments, we need tight integration with the CI/CD environment to optimize and parallelize the work; that's why you’ll see the fastest builds and deploys through Gatsby Cloud — the platform is purpose built for Gatsby!
I would have thought the generation process could be massively parallelised and a typical blog page would only need a modest amount of computation e.g. concat header, footer, pull in body text, resolve a few URLs. I can't help but think about how much work a typical computer game is doing in comparison 60 times per second even without a GPU.
How flexible is Hugo? And how many plugins does someone generally use?
It processes Markdown, JSON, YAML and SASS, can pull in data files from URLs, and has custom templates/themes, custom macros/shortcuts, image processing and live reload. It doesn't have a plugin system as far as I know but nothing stops you combining Hugo with other tools e.g. run a JS script to pull in and transform a JSON file before Hugo runs.
It does have themes, but I just write my own.
However, in many cases build time is slow because you're doing something that's slow, like calling a REST API. You are not going to generate 10k pages in 2sec if you need to make 10k REST requests, each taking 100ms, to a remote API to fetch the data for your pages. This kind of "data integration" from various sources is a standard use cases for site generators like gatsby and next.js. It seems like what this is targeting is smarter caching to avoid such expensive calls when possible.
Hugo is different in that it basically just transforms local HTML/Templates/Markdown. That's always fast. Even JS can handle that.
Do you know of any benchmarks that show that? As far as I know, most static site generators can take minutes to process a few thousand Markdown files with Hugo being the exception.
more technical details would be good but I guess either I missed it or they look at it as IP
Javascript re-invents "Promises" because callback hell
Javascript re-invents "compilers" (babel)
Javascript re-invents "build systems" (webpack, etc)
Javascript re-invents "caching" (incremental builds) - but paid, and in the cloud
Because why not.
Yes, it calls an API. And thankfully with Formspree, it's pretty easy to see the price breakeven points vs. hosting, but there are benefits to be had.