Skip to content

Top New Best Ask Show Jobs

Show HN: Sitebulb, a website crawler and auditor for SEOs | Better HN

Show HN: Sitebulb, a website crawler and auditor for SEOs (opens in new tab)

(sitebulb.com)

120 pointshathawayp8y ago72 comments

72 comments

throwaway2016a8y ago

This looks really good. My only comment I have is that if I am purchasing a piece of desktop software I expect there to be at least the option of an annual license.

Monthly feels more like a service than a product and I think of desktop software as a product.

But I realize Desktop software pricing is evolving so this may just be old school thinking on my part.

Also, if I'm a Mac user (I am) I expect to be able to buy / pay through the app store. I know it sucks because Apple takes a huge cut but there is a level of trust (misplaced or not) and ease of use that comes with the app store. I am much more likely to use an app if it is in there. Just like on Linux I am much more likely to use an app if it is in a package manager.

Speaking of that, I don't know what desktop framework was used but, is there anything preventing a Linux version?

hathawaypOP8y ago

Interesting comment regarding the App Store. To be honest I'm really surprised no one has ever said this to us before. We have another product - (http://urlprofiler.com/) - for Windows and Mac that we've been selling for 3 years and no one has ever given us this feedback.

We're not wedded to a price structure, although we're rolling with monthly for now. I'm pretty sure through weight of demand that we'll need to add Yearly plans in the next few months.

There's nothing preventing a Linux version (it's built in Electron) other than demand really. We'll do it if enough people want it, but we have a bunch of other features on our roadmap that are currently a higher priority.

throwaway2016a8y ago

The app store has a few benefits. The biggest one, for paid software is that if I format my computer or move to a new computer my license come over with my Apple account. I don't need to worry about remembering an account and license keys for each each.

I can't count how many apps I've paid for but I don't have installed right now because I lost a hard drive and my license keys with it.

Though I'm guessing with monthly billing it is more a username / password thing for you vs a key. (I haven't tried it yet, sorry).

The second benefit is not having to give my billing info to yet another company.

I'm not unlikely to install an app not from the app store. But I'd say I'm... complete guess here... 30% more likely to install it if it is in the app store.

And I can also say that if I have the choice between the two (same software on the app store and off), I will always do the one in the app store because of the previously listed benefits.

hathawaypOP8y ago

Thanks for the feedback, good to know.

Yeah we use a username/password so there's no issue with losing a license key.

btown8y ago

Adobe Creative Cloud (as well as how streaming services have made people at all levels inured to low monthly subscriptions) has made this type of monthly pricing much more normalized - good for providers who can now point to recurring revenue, not so great for users who usually end up paying more.

SyneRyder8y ago

It's worth considering what your particular customers want. I make Photoshop plugins, and since the CC subscription came in, I've heard of a lot of customers jumping ship to one-off purchase products, like Affinity Photo and Paint Shop Pro. But some other customers are happy with it.

Also, while Creative Cloud is paid monthly, individual plans have a minimum term of 12 months. If you cancel, you have to pay out 50% of the contract term (unless you prepaid for the year, in which case there are no refunds).

http://www.adobe.com/legal/subscription-terms.html

In a sense, all software is service. After all, what you buy is a license, the right to use the software, you don't buy the software itself. I suppose the US doesn't view it this way, but it definitely varies from country to country.

toss18y ago

Technically, that's correct, although it's a conceit that was foisted upon us by the software industry, partly as an artifact of copyright law and partly to gain more control over their product than others could gain.

As futile as it is, I'd rather see the concept go away than see it further enshrined in subscription services.

WA8y ago

The concept won’t go away, because users expect updates. The times when software actually was finished are over.

If we don’t expect software to have any updates at all, we can go back to one time fees.

mackatsol8y ago

I'll add on to that.. if I download desktop software, I expect an app and not a package file I need to install.. especially if said package only contains an app and nothing else. Strange. :-)

I am personally fine with a monthly fee for desktop software. If I'm paying for software, it's up to the person who sells the software to decide how I should pay, and for me to decide if it's worth it.

gog8y ago

I am definitively not fine with that. Of course it's up to the person selling the software to decide on the pricing model, but it's also up to the customers to voice their opinion.

I refuse to buy software that is subscription based. The closest to this that I find OK is PHPStorm which has a hybrid model where if you pay for a year you get to keep the version you installed at the beginning of your subscription (or after a year of monthly payments have been made).

I don't expect companies to work on the software for free, if there is an upgrade worth upgrading I am prepared to pay for it.

How many software licences do you pay a month ? Because they add up, very, very quickly.

hathawaypOP8y ago

Hey HN, Sitebulb is a desktop crawler for Windows and Mac, specifically designed for SEO consultants/agencies. Me and my business partner have been building it for the last 2 years or so, and we're finally launching it today.

It's main differentiating factors: 1. Scale – it can comfortably crawl 500,000+ page websites despite being a desktop program. 2. Reporting – it does a lot of data manipulation and processing so you don't have to. 3. Visualization – it has tons of useful graphs, including the Crawl Maps, which help you visualise site structure.

Our aim was to give it the reporting capability of a SaaS crawler, with the convenience of a desktop crawler.

Looking forward to hearing your feedback on our new product. Thanks, HN community!

Sujan8y ago

Some feedback:

- Why require the email confirmation before using the software? Not really necessary, is it?

- No Umlauts in project name?

- Standard/advanced settings switcher is confusing

- Crawl Maps is not linked from the "Product" dropdown

- "Recent audits" shows finished and queued ones, but not running ones (which also have no menu point)

- Super simple option to limit crawl to "internal" URLs would be nice (or did I miss it?)

- "Filtered URL Lists" is a strange navigation option, above the main selection especially

- Why no endless scrolling in tables? This is what a desktop app should do better than browsers

Nice tool!

hathawaypOP8y ago

Thanks for all the detail! Here you go:

- Email confirmation is required for the username/password, which is how free and trial licenses are controlled, and ultimately how paid licenses are doled out. So we need it for the licensing.

- No special characters at all! Excepts periods. Sorry!

- Agreed, we need to improve the settings switcher.

- Crawl Maps is not linked - you mean on the website right? I'll fix that.

- Running audits show on the main Dashboard, seemed kinda overkill to put it on Recent Audits as well. No?

- You can switch of 'Check external' in the Advanced Settings. Kinda 'hidden away' to keep the main settings UI cleaner (otherwise where does it end?!)

- "Filtered URL Lists" - they are there because people want them ('a big list of all the URLs') and kept missing them in our usability tests!

- Why no endless scrolling in tables? It's not easy to do because the data is written to disk, rather than stored in RAM (which is the reason it can typically crawl more pages), so it needs to go and fetch/filter/etc... every time.

Sujan8y ago

All makes sense, thanks for the reply.

> Email confirmation is required for the username/password, which is how free and trial licenses are controlled, and ultimately how paid licenses are doled out. So we need it for the licensing.

If you are interested in getting more free users into the app to try it, I would suggest to rework the licencing stuff a bit to enable the usage without email, but at least without confirmation. Should be worth the effort, and you can still require login when swicthing from free/trial to paid.

> Crawl Maps is not linked - you mean on the website right? I'll fix that.

Yep, no link in the feature dropdown.

> Running audits show on the main Dashboard, seemed kinda overkill to put it on Recent Audits as well. No?

Maybe. I like structure, so was expecting it to be shown a level down from the Dashboard somewhere as well.

> You can switch of 'Check external' in the Advanced Settings. Kinda 'hidden away' to keep the main settings UI cleaner (otherwise where does it end?!)

Ok, I think I am biased because I usually use a tool that is "internal only" by default.

> - "Filtered URL Lists" - they are there because people want them ('a big list of all the URLs') and kept missing them in our usability tests!

Umm, ok. "Crawled URLs" maybe?

> Why no endless scrolling in tables? It's not easy to do because the data is written to disk, rather than stored in RAM (which is the reason it can typically crawl more pages), so it needs to go and fetch/filter/etc... every time.

If some websites can do it with a request to the server each time the next results are loaded, I am sure you can also do that with whatever local database you use ;)

The problem is for big crawls( and 500k is not large) you probably don't want to use your desktop for example my home adsl is only 3.5 as we are 6kyards from the exchange.

And I would not want to get my works 100Mb banned by google. This is where services like deep crawl come in to play I can set up my sites to be crawled at night and look at the reports in the morning.

And another problem I found is desktop crawlers are very resource hungry at one small agency we had two striped down dedicated machines just to run crawls as the risk of causing a crash was to high

hathawaypOP8y ago

Yeah for really big crawls your probably better off sticking it on a server or AWS, as much as anything so you don't need to leave your computer on for ages.

But Sitebulb is not resource hungry in the same was as other desktop crawlers. It saves to disk instead of using RAM, so you don't experience the same limitations.

I'm not sure what you mean about Google. There is no link between Sitebulb and Google - it doesn't visit Google at all, so there is no risk of banning. Using it on your 100 Mb work line would be ideal.

stingraycharles8y ago

Interesting that it’s all a desktop app. What problem do you think this solves compared to something that runs in the cloud ? Apart from the cost structure, I can’t think of anything myself.

hathawaypOP8y ago

The other big thing, in comparison to cloud software, is convenience. You can setup a crawl and start it running -and see URLs being crawled - within a minute.

On cloud software that's simply not possible, due to the way that everything is scheduled.

There are a few other small things, such as being able to view Audits offline (what we call 'train mode').

The cost structure can be a big limiting factor though, especially for smaller companies. Sitebulb effectively remove all limitations around number of domains, number of projects, total number of URLs crawled etc...

methyl8y ago

> The other big thing, in comparison to cloud software, is convenience. You can setup a crawl and start it running -and see URLs being crawled - within a minute.

This depends on implementation. If the architecture is modern and well thought, using dynamic scaling or even AWS Lambda, the result should be available much faster on the cloud software due to ability to parallelization. You can only have so much network bandwith / CPU power locally and if you need to crawl hundreds of pages to get your result, it matters a lot.

Disclaimer: I'm building a SaaS tool for SEO which also involves page crawling.

throwaway2016a8y ago

> On cloud software that's simply not possible, due to the way that everything is scheduled.

As someone who works in cloud software this makes me cringe a little.

I have no doubt this is how existing cloud SEO crawlers work but with elastic scaling, web sockets, and serverless there is no reason why this has to be true.

It is not a limitation of cloud software. It is a sign of devs and/or product owners deciding making instant results is not a priority for the product.

Edit: I hear that a lot from industries that are not intimately familiar with web apps. "You can't do that on the cloud"... a typical web software engineer will not be able to do it but there are people out there who can. They are more expensive than your typical developer but if depending on your product they are worth it.

rmccoy64358y ago

The one advantage that I can think of is this can run on websites that are in development and not accessible to something running in the cloud. A lot of enterprise websites have their dev/stage behind a VPN and being able to run this against those without having to find out how to jump through hoops would be really nice (which is looks like this is capable of doing since you just need to feed it a URL). On top of that you also don't have to worry about what they're doing with the data output by the program on their server.

methyl8y ago

> On top of that you also don't have to worry about what they're doing with the data output by the program on their server.

Why would you care if the site is available in the Internet and you can't control who is browsing it?

This is a valid argument only for internal websites which are not a subject for SEO anyway.

hathawaypOP8y ago

Yep exactly. All crawls are stored locally so there is no data issue to worry about.

mustafabisic18y ago

We're nobody special. We're not a cool startup that's just secured funding, we're a bootstrapped, 2 man team and we've built both our products from the ground up.

Their honesty bought me! Nice going bros.

hathawaypOP8y ago

:) thanks

You might like these as well: https://sitebulb.com/release-notes/

Ditto on the sitebulb love. It's a well-done product and a welcome challenger to the status-quo of SEO software out there.

I will say the cost gives me hesitation but you've put a lot of work into it so I understand the justification.

For the visualization, does the crawl map limit the connections? I was expecting to see more of a web with pages linked from the entire site. Can you tell us more about that?

Thanks

hathawaypOP8y ago

Price is always a difficult one trying to get the cost/value balance right. We did some pricing sensitivity testing before launch so I'm hopeful we've not got it too wrong.

Regarding Crawl Maps, yeah it does have some limitations on, which I've written about here - https://sitebulb.com/resources/guides/crawl-maps-faqs/

Although from your comment I think you might be thinking it is a link map, rather than a crawl map. So with the Crawl Map it is mapping out how each URL/node was found when the crawler traversed the site. So each node will only ever have one edge/link.

A link map ends up a LOT more messy, although it's on our roadmap to try and build one of these too!

will_critchlow8y ago

"Almost everything looks like a graph. Almost nothing should be drawn as one."

It's really hard to make sense of full link graph visualisations. I'm talking about this in an upcoming conference presentation. We should share notes :)

hathawaypOP8y ago

Absolutely. In development we tried different ways to make the Crawl Map also represent link data, and they were all just unintelligible. Even the Crawl Maps on big sites are hard to get your head around, and that's with Sitebulb sampling quite heavily.

I'd love for us to come up with some sort of solution for it, I just don't know how we'd do it!

SL presentation I assume?

will_critchlow8y ago

Oh - I meant to say - that quote is from this book: http://shop.oreilly.com/product/9780596514556.do

SnowingXIV8y ago

What's this doing differently or better than screamingfrog? Which is also a desktop program and provides quite robust information. SF has been one of the standard industry tools for those doing SEO for years.

hathawaypOP8y ago

The main difference from Screaming Frog (which is legitimately awesome) is the reporting. Once it has finished crawling it will do a lot of pre-processing for you and build graphs, lists of hints, etc...

I've written a more comprehensive answer to this here: https://sitebulb.com/resources/guides/how-is-sitebulb-differ...

0x4a428y ago

I'm testing Sitebulb right now (trial version). The crawling is kinda slow (I'm on 100mbits fiber). Why did you choose to build eveything from scratch instead of making an application that use the results from other crawlers/spiders (ie: Screaming Frog) and just produce the audit reports?

EDIT:

And after about 3hours of crawl, this is what I got (and no way to resume it):

>Audit Stopped! >The audit stopped early because: Maximum Crawl depth limit of 50 reached

>WARNING: Audit Paused ! The audit is incomplete and did not finish properly.

0x4a428y ago

I left the program running in the background and it resumed itself after a few minutes. I have no idea why as there was no info on the dashboard.

scaryclam8y ago

SF is also a lot cheaper. With VAT, you're looking at £705.60 for sitebulb per year, vs £149.00 per year for SF. That's a really big price difference and you'd have to be really sure it's worth it.

In addition SF works on Ubuntu, which is another point in its favour.

hathawaypOP8y ago

I know right, SF is just too cheap for its own good! :p

We think it's a case of horses for courses. Sitebulb has the potential to save you a ton of time when auditing and reporting. If you don't do a lot of that then it might not be a good option for you. If you do, that's where a lot of the value lies.

There's a fully featured 2 week trial to give it a proper go, and the monthly billing means you have the option to switch it on/off as you need it.

Agreed. This seems like SF + Gephi for 7x the price.

hathawaypOP8y ago

We should put that on our homepage.

Have been using during the beta programme - really useful tool for doing site audits and focusing quickly on the areas that can make the most difference. Already using on client work and it is now a key tool for me alongside Screaming Frog and others. https://a.paddle.com/click?said=431&aaid=2812&link_id=380&ch... (my affiliate link)

hathawaypOP8y ago

Sorry, I meant to say, there's a free 14 day trial available to anyone once you download the software (no credit card required).

MattLeBlanc0018y ago

I keep trying to sign up, but it prompt me with: " you already have an account...", when in fact I don't.

hathawaypOP8y ago

Hey I'd appreciate if you could ping me your email address to support@sitebulb.com so we can see what's going wrong with your account (or lack thereof).

blacksmith_tb8y ago

Hmm, I set it to crawl our little Ghost blog (e.g. blog.example.com) and it immediately jumped to spidering our main site (e.g. www.example.com). Now, I imagine if I had properly poked at the Advanced settings I could have limited it to the initial subdomain, but I would have expected that to be the default...

hathawaypOP8y ago

It should stick the subdomain you specify in the start URL, unless there is a redirect or something. Other subdomains won't be crawled, although it will HTTP status check links to subdomains. So possibly that is what you saw in the URL log on the crawl progress page?

If you want me to take a closer look send the subdomain over to support@sitebulb.com and I'll see what's going on.

cubano8y ago

Avast free WebShield on Win7 is giving me a FileRepMalware error and is aborting the download and connection.

I am not exactly an expert on Avast and this particular error but perhaps someone here would like to know about this.

hathawaypOP8y ago

That's frustrating, sorry. It's a 'reputation issue', that over-protective anti-virus software doles out to smaller software vendors like ourselves. Basically they don't know if it is good or bad because we haven't had millions of installations.

i.e. it's a false positive

cubano8y ago

Sure no worries..of course I personally knew that, but I thought it might be useful for you guys to know its being blocked by the AV.

It's a really nice product BTW...good job.

I've been testing Sitebulb for a few months now and I'm really impressed, you've done solid work guys :) good luck with the launch

hathawaypOP8y ago

Thank you! And thanks for the beta feedback, it was super helpful.

sbeckeriv8y ago

Nice to use and useful insights. Thanks!

danielsamuels8y ago

How does this compare to something like Scrutiny, which seems to do a similar thing?

hathawaypOP8y ago

It has a lot more comprehensive reporting and data visualization than the likes of Scrutiny. I have no idea of the scale limitations of Scrutiny, but I'd be very surprised if it can handle ~500,000 URLs.

Also Sitebulb is for both Windows and Mac.

grandpoobah8y ago

why is the windows download 130mb? that seems excessve

garethbrown8y ago

Yeah and that's compressed :) I'm in the process of getting it down, but there's a few things in there don't help its size. For instance, the latest versions of Electron and Phantom a reasonably big.

hayksaakian8y ago

How does this compete against screaming frog?

hathawaypOP8y ago

Answered this one below already. For completeness:

The main difference from Screaming Frog (which is legitimately awesome) is the reporting. Once it has finished crawling it will do a lot of pre-processing for you and build graphs, lists of hints, etc... I've written a more comprehensive answer to this here:

https://sitebulb.com/resources/guides/how-is-sitebulb-differ...

hayksaakian8y ago

thanks, i'll check it out

chad_strategic8y ago

Linux version?

hathawaypOP8y ago

I'm afraid not, at least not yet. It's something we'll work on if the demand appears to be there.

Right now we are focused on other features that appear to be a higher priority to our users.

lpasselin8y ago

I'd also be interested

j / k navigate · click thread line to collapse