FDA launches openFDA to provide easy access to valuable FDA public data (opens in new tab)

(fda.gov)

173 pointsloopasam11y ago27 comments

27 comments

Great to see this on HN! I'm one of the openFDA core team members and would love to help people who are interested in using the public drug adverse event API. It's good to note that we've also released all of the source code behind the platform (https://github.com/fda) and are actively interested in having members of the community help us make improvements.

Please do ping me if you have any questions about the API or want to learn more! sean.herron@fda.hhs.gov

Also, here's a direct link to the API documentation: https://open.fda.gov/drug/event

Osiris11y ago

I'm sure a lot of people, myself included, feel that government projects would be better and cheaper if they were developed as open source rather the the typical proprietary solutions developed by contractors that we see today.

What's your take on government and open source projects like this one?

seanherron11y ago

Completely agree. We built openFDA from the beginning with the mindset that everything we produce will be open source. Our hope is that users of openFDA can help us make the API more efficient, return better data (we do a lot of cleanup), and independently verify our methodology.

Beyond improving our own site, it would be absolutely fantastic if someone took openFDA and spun up their own copy. That could be another government agency using it to serve up different data, an external group mirroring openFDA in case of government shutdown or other issue, or a company that uses our code to build something innovative.

I know that sentiment is shared among a lot of agencies right now. In particular, 18F (https://18f.gsa.gov) is a new digital services delivery unit that is looking to do this at a huge scale across the federal government.

afarrell11y ago

The current state of federal IT contracting is so horrendous, that it is worth trying something if it has a 50% chance of failure, but how much harder is it to get contractors to work on a project if your contract mandates that it be open source or Free Software?

seanherron, do you work for a contractor or are you an in-house developer for the FDA or another federal agency?

1 more reply

loopasamOP11y ago

The post mentions that "The FDA will continually work to identify additional public datasets to make available through openFDA" - do you guys already have an idea on what datasets are coming next?

seanherron11y ago

We're going to focus on product recall and product labeling data next - expect to see some more releases throughout the summer.

1 more reply

IanCal11y ago

I've found myself more and more depending on raw data dumps rather than APIs, so was extremely happy to see mention of this on the openFDA page. Are you currently offering this or is that still to come?

I've been doing luigi pipeline work recently, I might see if I can get yours running and get some pull requests in :)

machbio11y ago

Its a great Initiative for people working in Drug repositioning especially, thank you for working towards bringing about such a nice technological api that suits bioinformaticians like me... I have a question on why do you limit the Api calls to 60000 with a key per day, what is stopping you from setting an higher limit..

seanherron11y ago

We set it to 120req/minute and 60,000req/day to ensure that load on the system isn't too high at launch. Over the next few weeks, we'll be adjusting the limits based on the traffic patterns we see.

As noted in the documentation, if you need more than 60,000 per day, give us a ring at open@fda.hhs.gov.

Huge shout out to api.data.gov as well - all of our key authentication and analytics are powered by their open source API Umbrella platform.

DrugCite11y ago

Hey guys, I'm an owner of DrugCite.com, we've talked to the FDA a few times over the last few years while building our site (They contacted us at various points with questions). They let us know last August they would be releasing this site but I guess I never thought it would so close to ours. Examples: Our Drug Page: http://www.drugcite.com/?q=ABILIFY FDA Drug Page http://open.fda.gov/drug/event/

Looks like they're including some data we didn't previous have access to or know about. Anyways, make no mistake this is huge and will be incredibly useful for doctors and patients everywhere, this is some great data. This is the type of data that should be investigated before you take any medication, prescription or not. I can see some of this data becoming common label information shortly.

droope11y ago

I am sorry I don't see any similarities between the two pages.

danso11y ago

It's really encouraging to see that there's someone in the U.S. gov't who not only cares about open source and the associated effects of transparency, but has some practical experience* in it.

The openFDA website is built on Jekyll (https://github.com/FDA/open.fda.gov) and its API is powered with Python and Node.js (https://github.com/FDA/openfda)...It's not just the framework/current-tooling that is nice, but that such systems use open, readable formats (such as Markdown for the web pages).

The current administration has always paid lip-service toward open-source...they won't satisfy people who think "open source" and "government" means hand over just about everything...but they're doing a good job making inroads on the parts of the U.S. data interfaces that were well-intended, but so obfuscated by poor design that it was a job in itself to parse/scrape their sites.

(FDA has always had really exhaustive dumps of their data...strewn about their legacy site...the API isn't as interesting to me as the documentation for the API and the pipeline of data)

* I don't want to just slag on Drupal...but Drupal was what Obama's head tech officer wanted in place, and to their credit, they did open-source parts of their custom Drupal modules...which were not particularly useful, because of the particulars of Drupal's module system and its quickly changing API...nevermind being only useful for other Drupal installations. But a lot of credit has to go to the U.S. gov't for pivoting off of Drupal to a mix of WordPress, Jekyll, and even node.js sites with less coupled components. It's been only about two or so years since Data.gov open-sourced its Drupal components before promptly switching to WordPress and CKAN modules...considering how a non-significant number of the fed sites are built on 12+ year-old code...the turnaround in the U.S. gov's stack is pretty amazing...(when it's not attempted on a service-critical site, such as healthcare.gov)

ef411y ago

If you're interested in this, you should also check out the UMLS published by the National Institutes of Health.

https://www.nlm.nih.gov/research/umls/

It's a very big semantic database of health terminology. Among other things, it has a subset called RxNorm that contains all currently prescribe-able prescription drugs.

I've been very impressed with it and I feel like not enough people have heard of it.

RA_Fisher11y ago

I have some history working with the Adverse Event Reporting data. The API is nice, but it just exposes what they already offer in flat files .... stale data. Does it seem reasonable to you that the FDA runs a year behind on this data?

http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInforma...

They don't respond: https://twitter.com/statwonk/status/413355130461761536

gbinal11y ago

This is awesome. Adding to the list of US Federal APIs - http://18f.github.io/API-All-the-X/pages/status

skram11y ago

My colleague and I have developed a search interface for web browsers at http://openfdasearch.herokuapp.com/

Feel free to tweet at us at @SocialHealthIs, @Geek_Nurse, and/or @Skram

bello11y ago

Awesome initiative! As I was testing it, I noticed that the response consists of prettified JSON. I'm guessing that all that whitespace can be removed to save bandwidth?

mmohebbi11y ago

Thanks!

We support gzip on the api json response for clients that support it. Given that, I'd expect the size improvements would be minimal for whitespace stripping but let us know if you have evidence to the contrary!

_ciz911y ago

Good point! I always blindly assumed that removing whitespaces would lead to a decent size improvement, even after gzipping, so I ran a small test (gzip on linux, default parameters):

http://i.imgur.com/U1O4Xg5.png

The raw json contains the whitespaces, while they were removed in the minified json. So there is a 47% improvement for the uncompressed version, and a 21% improvement for the compressed version.

What would be interesting to see is how the second (compressed) number scales with the filesize (I don't know enough about compression algorithms to guess that).

EDIT: I really don't know how to format a table in plaintext...

lucidrains11y ago

Ok. This is actually pretty surprising and welcomed news!

sogen11y ago

Impressed, very very useful. Thanks!

Istof11y ago

I hope the NSA will do the same and also open their datasets... maybe someone could catch someone who plans some "terroristic" actions before they happen...

j / k navigate · click thread line to collapse

27 comments

seanherron11y ago

Please do ping me if you have any questions about the API or want to learn more! sean.herron@fda.hhs.gov

Also, here's a direct link to the API documentation: https://open.fda.gov/drug/event

Osiris11y ago

What's your take on government and open source projects like this one?

seanherron11y ago

afarrell11y ago

seanherron, do you work for a contractor or are you an in-house developer for the FDA or another federal agency?

1 more reply

loopasamOP11y ago

The post mentions that "The FDA will continually work to identify additional public datasets to make available through openFDA" - do you guys already have an idea on what datasets are coming next?

seanherron11y ago

We're going to focus on product recall and product labeling data next - expect to see some more releases throughout the summer.

1 more reply

IanCal11y ago

I've been doing luigi pipeline work recently, I might see if I can get yours running and get some pull requests in :)

machbio11y ago

seanherron11y ago

We set it to 120req/minute and 60,000req/day to ensure that load on the system isn't too high at launch. Over the next few weeks, we'll be adjusting the limits based on the traffic patterns we see.

As noted in the documentation, if you need more than 60,000 per day, give us a ring at open@fda.hhs.gov.

Huge shout out to api.data.gov as well - all of our key authentication and analytics are powered by their open source API Umbrella platform.

DrugCite11y ago

droope11y ago

I am sorry I don't see any similarities between the two pages.

danso11y ago

It's really encouraging to see that there's someone in the U.S. gov't who not only cares about open source and the associated effects of transparency, but has some practical experience* in it.

(FDA has always had really exhaustive dumps of their data...strewn about their legacy site...the API isn't as interesting to me as the documentation for the API and the pipeline of data)

ef411y ago

If you're interested in this, you should also check out the UMLS published by the National Institutes of Health.

https://www.nlm.nih.gov/research/umls/

It's a very big semantic database of health terminology. Among other things, it has a subset called RxNorm that contains all currently prescribe-able prescription drugs.

I've been very impressed with it and I feel like not enough people have heard of it.

RA_Fisher11y ago

http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInforma...

They don't respond: https://twitter.com/statwonk/status/413355130461761536

gbinal11y ago

This is awesome. Adding to the list of US Federal APIs - http://18f.github.io/API-All-the-X/pages/status

skram11y ago

My colleague and I have developed a search interface for web browsers at http://openfdasearch.herokuapp.com/

Feel free to tweet at us at @SocialHealthIs, @Geek_Nurse, and/or @Skram

bello11y ago

Awesome initiative! As I was testing it, I noticed that the response consists of prettified JSON. I'm guessing that all that whitespace can be removed to save bandwidth?

mmohebbi11y ago

Thanks!

_ciz911y ago

Good point! I always blindly assumed that removing whitespaces would lead to a decent size improvement, even after gzipping, so I ran a small test (gzip on linux, default parameters):

http://i.imgur.com/U1O4Xg5.png

The raw json contains the whitespaces, while they were removed in the minified json. So there is a 47% improvement for the uncompressed version, and a 21% improvement for the compressed version.

What would be interesting to see is how the second (compressed) number scales with the filesize (I don't know enough about compression algorithms to guess that).

EDIT: I really don't know how to format a table in plaintext...

lucidrains11y ago

Ok. This is actually pretty surprising and welcomed news!

sogen11y ago

Impressed, very very useful. Thanks!

Istof11y ago

I hope the NSA will do the same and also open their datasets... maybe someone could catch someone who plans some "terroristic" actions before they happen...

j / k navigate · click thread line to collapse