Show HN: WrapAPI v2 – Build APIs, scrapers, bots on any website (opens in new tab)

(wrapapi.com)

166 pointswrapapi8y ago60 comments

60 comments

Suggestion: Please be upfront about pricing. I've been bitten with similar tool dying because they were too shy to ask for money.

chipperyman5738y ago

They have a link to the pricing page[1] on the navbar

[1] https://wrapapi.com/v2#/pricing

wrapapiOP8y ago

Thanks for linking to this, chipperyman573! The parent comment used to be valid, but we added a dollar amount for the "Business" plan based on this and a few other comments; we are still willing to discuss discounts for non-profits, smaller companies, etc.

utopcell8y ago

which one ?

joshmn8y ago

I'm guessing KimonoLabs.

OhSoHumble8y ago

I've always felt that I service like this would be great for Code for America projects. A big problem I have with creating technology applications for civic goods is that the government is terrible at providing open data. Even when they do open source civic data, they do a terrible job of it.

An example of this is California drought data: automatically grabbing data on the drought is incredibly difficult because it involves scraping HTML tables. I tried to build an API that presents drought data so volunteers would have an easier time building out data visualizations. I ended up just getting exhausted doing all the scraping work.

I then moved onto a new project: building a free-to-use Padmapper for affordable housing. The data for income restricted apartment units are driven by a government contracted vendor. A city county will declare income stabilization policies and legally enforce them against landowners and then the landowners would send over their list of units to the vendor.

This would be great except the vendor does the bare minimum. Padmapper looks amazing but, really, it's only applicable for the upper middle class due to explosive housing costs in the Bay Area. So, in order to provide a more modern website and mobile application for the community, I started to scrape the vendor's website. It was terrible. I kept getting throttled. So I gave up.

wrapapiOP8y ago

@OhSoHumble: this actually came to mind for us too in terms of building better experiences on top of common government services. I'd love to chat and see how I can help, so shoot me an email at peter@wrapapi.com

dhruvkar8y ago

What was the vendor's website?

OhSoHumble8y ago

socialserve.com

wrapapiOP8y ago

Hi everyone! We just released the second version of WrapAPI

We have a new WrapAPI API Builder looks like a browser, and is as easy to use as one too. You can define your API's inputs with a quick tap on the address bar, and point and click at the data you want to extract.

We also have a Chrome extension is smarter and better-integrated than ever. It records your requests and It'll automatically create parameter inputs for the values that change between requests to the same endpoints. The contents of your captures are immediately ready for you to start defining outputs and data to extract too.

Let me know if you have any questions or feedback!

sharemywin8y ago

if you take a screen shot of all the items being scraped you could build a dataset for a pretty powerful AI. Something that takes an image of a webpage and out puts machine readable data. Not saying there's a NN that can do it right now but it seems like eventually it could get there.

sharemywin8y ago

That same dataset in reverse could be an interesting GAN too. takes useful data and outputs a webpage for it.

zackify8y ago

Doesn't work at all with JS.

This is a big thing on many sites now.

Also, since that is the case, you could build this in a few hours using something like: https://github.com/bda-research/node-crawler. Yes, it would have no gui, so you lose that.

wrapapiOP8y ago

For sites that load data using AJAX, we recommend you take a look at our Chrome extension (https://wrapapi.com/#/chromePlugin). Our philosophy isn't to run a full headless browser (similar to Phantom), but rather make it really easy to find the AJAX requests that actually load the data you need.

RandomBookmarks8y ago

If JS a problem for you, try Kantu. It works with screenshots and uses OCR for scraping. The beauty is that it works with any kind of site. But clearly, the speed can not match a node.js or perl based scraper (mechanize etc), so it is not suitable for high volumes.

gardnr8y ago

Do you find it better than Phantom?

Just reading about Kantu now. It reminds me of http://www.sikuli.org/

1 more reply

supermdguy8y ago

I've had issues with web scraping with content generated from JS, and I just ran it through PhantomJS, then extracted the rendered HTML.

superasn8y ago

By the way your Onboarding step-by-step wizard[1] is really awesome. I've used similar scripts on my sites before but they keep breaking because the users often click on some div or button (or due to mobile phones) not intended (they're only learning) and then wizard can't sync to the next step and the whole thing breaks :/

Is this happening on your site? If not, would appreciate some tips about coding it and how to handle exception cases where the wizard can't keep in sync or user click on unintended page elements.

[1] http://i.imgur.com/aJzuvSD.jpg

wrapapiOP8y ago

Thanks! We used this awesome library called React-Joyride (https://github.com/gilbarbara/react-joyride) which made setting up the product tour a breeze. Since our product tour is on a single-page app, it works quite well.

The most helpful part is that you can pass a callback which will trigger before/during/after each step, which can let you ensure that the state of the page matches what you're expecting. In our case, we use it to make sure that you're switched to the right tab, etc. Take a look! I highly recommend it.

webninja8y ago

This tool is really well thought-out and useful! I made a working API in less than 1 hr. This tool has a much better design & implementation than Kimono and easier than using Python 3 + Beautiful Soup 4 which is how I made my previous web scrapers. This tool also works for POSTing to web forms.

dsacco8y ago

No offense, but your comment sounds like astroturfing (I'm not saying you are, just that it's part of a pattern I see).

I often see one or more commenters write what seems like an excessively positive thought dump on Show HNs. It just doesn't seem like the natural conversational tone everyone uses, but I can't quite put my finger on it.

Has anyone else noticed it? Is there a term for this sort of writing style?

webninja8y ago

It could be that I need to work on my writing skills. I'll admit, I'm an systems engineer not a writer. On the other hand, HN commenters tend to convey a healthy dose of cynicism and skepticism. Also it's known that negative comments come across as more trust-worthy than positive comments on the internet. I simply used this tool and it did what it said it did. I don't give a positive review unless I had a good experience. But it is easier for all of us to believe people with some degree of skepticism and cynicism.

wrapapiOP8y ago

Thanks webninja! The POSTing part is one of the biggest things we were trying to get right while not making it any harder to use than Kimono. Is there anything that was confusing when trying to learn it for the first time? If so, we're still trying to make it easier =)

webninja8y ago

I think it's pretty close to perfect. I made a sample WrapAPI for http://etfdailynews.com/etf/{{symbol}}/ where the symbol is a 3-4 character ETF ticker symbol like VYM. It's one of the only websites I've found that provides the full breakdown of an ETF's contents. Initially I wasn't sure where to provide sample input such as a handful of ticker 3 character strings since /{{symbol}}/ isn't a GET or a POST value. So under the JSON and Table column I had multiple symbols separated by commas and it took me a little while before I realized that you were only supposed to supply one test value and where I was supposed to supply that test value. But it "clicked" shortly after.

salimmadjd8y ago

What if the content you want to scrape->API is behind a login gate? Is there an option of authentication?

wrapapiOP8y ago

You can actually write an API endpoint that'll retrieve a state token (which includes the cookies). For one example, you can use view our Hacker News login endpoint at https://wrapapi.com/#/view/phsource/hackernews/login/latest

That endpoint will then emit a state token, which includes the session cookies. You can feed that state token into your next request and it'll authenticate you

webninja8y ago

Yes, you can get the content you want even if it's behind a login page! Expect to create 2 APIs, one for logging in and another for getting the content. An example is provided on their homepage: https://wrapapi.com/v2#/caseStudies/cj

programbreeding8y ago

This looks awesome. Thank you!

I wanted to give you a heads-up that the youtube video at the end of your joyride tutorial is broken.

It tries to play this: https://www.youtube.com/watch?v=10yKzP3gtkc

randomsofr8y ago

"Price: Contact us"

Why?

dyim8y ago

Likely because they have custom pricing based loosely on how much business value they create for the customer. E.g. if a philatelist wanted to scrape stamp catalogs, and if an industry-specific analytics platform wanted to scrape a directory of prospects - you'd want two different prices. Otherwise, you'd either 1) leave stamp enthusiasts out in the rain, or 2) leave a whole lot of meat on the bone w/r/t enterprise pricing. There might also be a consulting upsell!

dsacco8y ago

Eh, that seems like a non-problem. The solution to me is to leave stamp enthusiasts out in the rain. If your SaaS product can provide a lot of value in enterprise companies, $500 a month is not a lot to ask. And many people just want to see the price of a line item if it's a productized service so they can go back to someone higher with a purchase request.

When I was last working inside an organization and reviewing vendors for a product, it really left a bad taste in my mouth when they had "Ask for Pricing." I get it, my consulting work is basically Ask for Pricing, I understand the business strategy. But it's such a headache to sit through bullshit product demos for multiple vendors over a few weeks just to hear that their pricing structure is way out of line.

There is this idea that a lot of companies have, where they're more "professional" or conversion-optimized by removing public pricing and putting everyone through a sales funnel. But that concept only works if 1) you have a great product and 2) you have a great sales team, capable of making my time to failure in the conversion process fast and painless. Every company thinks they have this, but they almost never do. I really don't think you want to optimize your business for keeping stamp enthusiasts happy.

jazoom8y ago

Instant pass

startupdiscuss8y ago

The problem with opaque pricing is this: people don't want to start experimenting with something if it could be infinitely bad. i.e. if they can imagine the worst.

In the back of their heads, some people imagine the service is going to be huge, and then they worry that all the profits will be paid out to wrapapi.

Better to have a high headline number and then offer discounts for certain uses (non-profit, open source, students, etc). People are optimistic about how much money they might make so a high headline future price for when you graduate from the free tier is not necessarily bad.

throwaway81238y ago

Good luck--- I built a service that did very 'light' scraping and it was forced to shut down. I imagine your days are numbered.

RandomBookmarks8y ago

Interesting... how does this compare to https://a9t9.com/kantu/scraping ?

WrapApi seems to tackle the same task (web scraping) from a very different angle. I wonder if anyone has used both and can compare.

wrapapiOP8y ago

WrapAPI is meant to not only do scraping (reading information), but also to (1) perform actions with side effects and (2) allow for complex chaining

Let's say you have a web-based inventory management system or CRM that requires a login, but you want to take data a customer has sent you in a spreadsheet and automatically batch enter it into the CRM, which doesn't have that functionality. You could then:

1. Create an API endpoint that allows you to log into that system and return a state token

2. Create a second API endpoint that's parametrized the inputs of the form to create a new inventory entry

3. Chain those 2 API endpoints together so that the 2 actions are actually combined into one API call

Our focus is not only on getting data, but automating the many things that you or your company does with websites to save time

egfx8y ago

Man, what happened to Yahoo pipes? A tool that did all this so well.

superasn8y ago

This is reallly neat. But I think you guys are losing a lot of potential customers by not having a clear cut pricing on you site.

I've used similar services like parsehub.com in the past and if they didn't have a pricing page I would have never tried it. Just my 2 cents.

krmmalik8y ago

Im asking since the term API is mentioned. Is this designed for technical or non technical people? Im a non coder but really could do with the scraper so would this work for me?

wrapapiOP8y ago

This is designed for at least semi-technical people, but it's really not that hard to give it a try for simple sites. Try watching the video, and shoot me an email at peter@wrapapi.com if you run into any issues!

notwhoyouthink8y ago

Glitchy on Chrome 57. When I load a page to build, I'm unable to scroll more than 1/4 of the page then it jumps back to the top.

wrapapiOP8y ago

Let us know what page you're trying at peter@wrapapi.com, and I'll take a look and fix it ASAP.

profalseidol8y ago

How does it do when websites doesn't have a good standard in their html?

You are using xpath here right?

danvoell8y ago

How does this compare to Kimono?

nicoboo8y ago

Kimono shut down on February 29th, 2016 and the cloud service has been discontinued. It only exists as a desktop app now.

Bought by Palantir, they retired in a good way, keeping people's data available for a moment and communicating well.

It was a great product still complicated to get a practical business model.

This WrapAPI v2 is an alternative I think, but I would use them with care as the economical model is unsure and it seems to be really new, still promising! :)

codezero8y ago

The desktop app + browser plugin still seem to work fine. I've run into a few things that don't quite work well, like pages that have a combination of click-to-paginate and auto-scroll-paginate, but in general, it's good.

wrapapiOP8y ago

We're quite inspired by Kimono, and aim to be just as easy to use while handling use cases beyond scraping (e.g., fully automated form-filling, POST requests, etc.). One of the big feature requests we've been getting has been RSS feeds though, so we're definitely trying to get to full feature parity!

krystiangw8y ago

This is awesome. I'll give it a shot in my some of my next projects.

MUCHZER8y ago

Great! It really look like a nice portia fork

kasbah8y ago

Any evidence that it is in fact a fork of portia?

https://github.com/scrapinghub/portia

Jayakumark8y ago

The UI looks more like from import.io

graphememes8y ago

I've never had one of these work properly, better to build your own using some language and an html / text parser

matz18y ago

using the css selector can easily cause the page to be unresponsive.

hsource8y ago

Can you let us know which page and selector you're trying? I can debug it

devin8y ago

Try typing in "span a" as a CSS selector for the HN homepage.

iambpentameter8y ago

Does this violate any current US law?

cookiecaper8y ago

The software itself probably wouldn't, but the use of it for anything anyone cares about probably would. The CFAA, etc., make unwanted scraping illegal and this has been tested repeatedly in court.

The company that runs this software as a service needs to be very careful. 3Taps was similar and got destroyed for relaying data scraped from Craigslist.

Contacting the server after its operator has expressed its wish for you to stop is a violation of the CFAA (in that you are "exceeding authorized access" and/or gaining "unauthorized access" to a protected computer system). If it's found that the site's ToS is binding upon you, which it typically would be, you don't really even need separate notice to be held liable.

Storing a copy of a web page in RAM creates a copy that is eligible for copyright protection, and it is likely that any implied license to read that page will be invalidated by the access revocation.

IANAL.

wolco8y ago

Another court stated that copying data into a ram buffer for under 1.2 seconds was allowed. Depending on how they structure this it might be legally allowed.

https://books.google.ca/books?id=a-yu2-JUQNAC&pg=PT249&lpg=P...

1 more reply

j / k navigate · click thread line to collapse

60 comments

captn3m08y ago

Suggestion: Please be upfront about pricing. I've been bitten with similar tool dying because they were too shy to ask for money.

chipperyman5738y ago

They have a link to the pricing page[1] on the navbar

[1] https://wrapapi.com/v2#/pricing

wrapapiOP8y ago

utopcell8y ago

which one ?

joshmn8y ago

I'm guessing KimonoLabs.

OhSoHumble8y ago

wrapapiOP8y ago

dhruvkar8y ago

What was the vendor's website?

OhSoHumble8y ago

socialserve.com

wrapapiOP8y ago

Hi everyone! We just released the second version of WrapAPI

Let me know if you have any questions or feedback!

sharemywin8y ago

That same dataset in reverse could be an interesting GAN too. takes useful data and outputs a webpage for it.

zackify8y ago

Doesn't work at all with JS.

This is a big thing on many sites now.

Also, since that is the case, you could build this in a few hours using something like: https://github.com/bda-research/node-crawler. Yes, it would have no gui, so you lose that.

wrapapiOP8y ago

RandomBookmarks8y ago

gardnr8y ago

Do you find it better than Phantom?

Just reading about Kantu now. It reminds me of http://www.sikuli.org/

1 more reply

supermdguy8y ago

I've had issues with web scraping with content generated from JS, and I just ran it through PhantomJS, then extracted the rendered HTML.

superasn8y ago

Is this happening on your site? If not, would appreciate some tips about coding it and how to handle exception cases where the wizard can't keep in sync or user click on unintended page elements.

[1] http://i.imgur.com/aJzuvSD.jpg

wrapapiOP8y ago

webninja8y ago

dsacco8y ago

No offense, but your comment sounds like astroturfing (I'm not saying you are, just that it's part of a pattern I see).

Has anyone else noticed it? Is there a term for this sort of writing style?

webninja8y ago

wrapapiOP8y ago

webninja8y ago

salimmadjd8y ago

What if the content you want to scrape->API is behind a login gate? Is there an option of authentication?

wrapapiOP8y ago

That endpoint will then emit a state token, which includes the session cookies. You can feed that state token into your next request and it'll authenticate you

webninja8y ago

programbreeding8y ago

This looks awesome. Thank you!

I wanted to give you a heads-up that the youtube video at the end of your joyride tutorial is broken.

It tries to play this: https://www.youtube.com/watch?v=10yKzP3gtkc

randomsofr8y ago

"Price: Contact us"

Why?

dyim8y ago

dsacco8y ago

jazoom8y ago

Instant pass

startupdiscuss8y ago

The problem with opaque pricing is this: people don't want to start experimenting with something if it could be infinitely bad. i.e. if they can imagine the worst.

In the back of their heads, some people imagine the service is going to be huge, and then they worry that all the profits will be paid out to wrapapi.

throwaway81238y ago

Good luck--- I built a service that did very 'light' scraping and it was forced to shut down. I imagine your days are numbered.

RandomBookmarks8y ago

Interesting... how does this compare to https://a9t9.com/kantu/scraping ?

WrapApi seems to tackle the same task (web scraping) from a very different angle. I wonder if anyone has used both and can compare.

wrapapiOP8y ago

WrapAPI is meant to not only do scraping (reading information), but also to (1) perform actions with side effects and (2) allow for complex chaining

1. Create an API endpoint that allows you to log into that system and return a state token

2. Create a second API endpoint that's parametrized the inputs of the form to create a new inventory entry

3. Chain those 2 API endpoints together so that the 2 actions are actually combined into one API call

Our focus is not only on getting data, but automating the many things that you or your company does with websites to save time

egfx8y ago

Man, what happened to Yahoo pipes? A tool that did all this so well.

superasn8y ago

This is reallly neat. But I think you guys are losing a lot of potential customers by not having a clear cut pricing on you site.

I've used similar services like parsehub.com in the past and if they didn't have a pricing page I would have never tried it. Just my 2 cents.

krmmalik8y ago

Im asking since the term API is mentioned. Is this designed for technical or non technical people? Im a non coder but really could do with the scraper so would this work for me?

wrapapiOP8y ago

notwhoyouthink8y ago

Glitchy on Chrome 57. When I load a page to build, I'm unable to scroll more than 1/4 of the page then it jumps back to the top.

wrapapiOP8y ago

Let us know what page you're trying at peter@wrapapi.com, and I'll take a look and fix it ASAP.

profalseidol8y ago

How does it do when websites doesn't have a good standard in their html?

You are using xpath here right?

danvoell8y ago

How does this compare to Kimono?

nicoboo8y ago

Kimono shut down on February 29th, 2016 and the cloud service has been discontinued. It only exists as a desktop app now.

Bought by Palantir, they retired in a good way, keeping people's data available for a moment and communicating well.

It was a great product still complicated to get a practical business model.

This WrapAPI v2 is an alternative I think, but I would use them with care as the economical model is unsure and it seems to be really new, still promising! :)

codezero8y ago

wrapapiOP8y ago

krystiangw8y ago

This is awesome. I'll give it a shot in my some of my next projects.

MUCHZER8y ago

Great! It really look like a nice portia fork

kasbah8y ago

Any evidence that it is in fact a fork of portia?

https://github.com/scrapinghub/portia

Jayakumark8y ago

The UI looks more like from import.io

graphememes8y ago

I've never had one of these work properly, better to build your own using some language and an html / text parser

matz18y ago

using the css selector can easily cause the page to be unresponsive.

hsource8y ago

Can you let us know which page and selector you're trying? I can debug it

devin8y ago

Try typing in "span a" as a CSS selector for the HN homepage.

iambpentameter8y ago

Does this violate any current US law?

cookiecaper8y ago

The software itself probably wouldn't, but the use of it for anything anyone cares about probably would. The CFAA, etc., make unwanted scraping illegal and this has been tested repeatedly in court.

The company that runs this software as a service needs to be very careful. 3Taps was similar and got destroyed for relaying data scraped from Craigslist.

Storing a copy of a web page in RAM creates a copy that is eligible for copyright protection, and it is likely that any implied license to read that page will be invalidated by the access revocation.

IANAL.

wolco8y ago

Another court stated that copying data into a ram buffer for under 1.2 seconds was allowed. Depending on how they structure this it might be legally allowed.

https://books.google.ca/books?id=a-yu2-JUQNAC&pg=PT249&lpg=P...

1 more reply

j / k navigate · click thread line to collapse