Show HN: C library of generic, reference-counted data structures (opens in new tab)

(theck01.github.io)

91 pointstheck0112y ago40 comments

40 comments

A small observation: CoreFoundation typedefs CFTypeRef (the generic object type) to void * , rather than having obj * be a separate type as in this library. This lets you pass pointers to any CF type to functions that expect CFTypeRe without the ugly cast: it's unsafe, as you can pass a pointer to any type without the compiler complaining, but so is explicitly casting. If you're not going to go the void * route, it would be nice to provide some macro like obj(x) to safely cast only valid types to object.

Anyway, it's nice to see a portable, modern CF-like library, although of course it's too slow for some cases where code specialization is called for.

Edit: On the general subject of C data structures, in case anyone's interested, I wrote a little portable C header that provides minimal hash table functionality using macros, with a large amount of control - still generic, but totally opposite to the approach used by this library, much lower level. It's modeled after the BSD <sys/queue.h> header, provides three versions (open, closed, and a higher level version), and is completely undocumented, but I'm somewhat happy with it as a proof of concept. Just throwing it out there. https://github.com/comex/cbit/blob/master/hash.h

jzwinck12y ago

I was interested to know how this compares with existing C libraries, so I "ported" the prime number example program to use GMP and GLib instead. It's not quite working yet (see the description) and I have to run now, but here's the code--I think it's nearly complete and would be interesting to benchmark once done. https://gist.github.com/jzwinck/5787359

theck01OP12y ago

Shoot, if I had known someone was going to do this I would have made the prime number program a bit more efficient. Since it looks like you've ported the algorithm exactly it shouldn't affect the result though. Let me know what the test reveals, I tried to write the fasted arbitrary precision algorithms I could.

andrewcooke12y ago

nice. it's heartwarming to see things like this in c. in my spare time i am working on struct-sql mapping. it sometimes feels like the popular c libraries don't try hard enough to provide the convenience we get in other languages. with a little imagination (as here) i think we can push things a little further...

jessedhillon12y ago

Is your project hosted somewhere? I would love to see the source for that, no matter how early/unpolished it is.

andrewcooke12y ago

sure, there's an example at https://bitbucket.org/isti/c-orm/src/4ec76f741d756bf70c2b1b8... but i don't have any text explaining it yet, and it's a bit hard to see what is happening.

but basically there's a python script that parses your struct definitions and then auto-generates a library (this isn't great i know, but what else can you do? if you don't want to use that, you can still use it as an SQL library, but you need to write your own callback functions to do the work of setting values in the struct). that library is

    #include "phonebook.corm.h"

in the linked code. then you can do things like

    static int find_name(isti_db *db, const char *text, name** name) {
      corm_name_select *select = NULL;
      *name = NULL;
      STATUS;
      CHECK(cname.select(&select, db));
      CHECK(select->name(select, "like", text)->_go_one(select, name));
      EXIT;
      if (status == ISTI_ERR_NO_RESULT) status = ISTI_OK; // see NULL name
      if (select) status = select->_free(select, status);
      RETURN;

}

(the STATUS, CHECK, EXIT and RETURN are just macros for the usual "return an int as status and goto exit on error" handling). the snippet above populates the name (a struct typedef) pointer with values (id and name value in this case) from the database. the thing called select is a struct with function pointers that generates SQL select functions (so there's the usual objects-in-C pattern of passing the struct in to the function in "select->name(select, ..." for example). and the SQL being generated is "select * from [table] where name like [value]". it's all escaped correctly to avoid SQL injection.

the API for the database backend is isolated out, but the only current implementation is sqlite.

the complete repo is at https://bitbucket.org/isti/c-orm/src (if you download it and run doxygen you should see better docs - but they're incomplete and not online yet)

if people are interested, email me at andrew@acooke.org and i'll get back to you when it's in a usable state (it mainly needs docs, polishing, and perhaps another database backend - probably postgres).

also, of course, there are many limitations compared to ORM systems. there's no way to retrieve related objects, for example (so if a struct has a pointer to another struct, that pointer isn't retrieved - the best you can do is also store a pointer to the primary key and then make a second call based on that). and currently table and column names must exactly match struct and field names.

canadev12y ago

What is struct? Or do you mean the C struct type <-> SQL? Like an ORM for C?

andrewcooke12y ago

yup, exactly - see link above.

obiterdictum12y ago

Along the same lines, there is also CCAN.

http://ccodearchive.net/list.html

https://github.com/rustyrussell/ccan

And GMP for arbitrary precision math (integers and floating point numbers).

http://gmplib.org/

huhtenberg12y ago

I find that the decision to adopt a C library for a project frequently boils down to whether it is using a compatible naming notation. Especially for simpler libraries.

Linked library in nice, but how many C projects do you know that use camelCase() function naming as opposed to K&R's lower_case_naming()? If it were a complex library, like OpenSSL, then - sure, to the hell with the notation, just put a wrapper around it and use it anyway. It's barely an issue. But if it is a simpler library that is meant to be weaved into the code, like data containers, the choice of naming notation is always a thing to consider.

There is obviously an indent tool, but resorting to it means using a modified version of the original with all the consequences that follow. Perhaps, it might be the next thing for GitHub to tackle - "Download in Xyz naming notation"... I know I'd use it.

mappu12y ago

Was there a rationale to choosing performActionOnStruct() names instead of the perhaps-more-idiomatic struct_perform_action() style?

I strongly advise against putting identifiers like release() and getCString() in the global namespace, that's probably not the wisest idea if you plan to use libraries other than your own.

theck01OP12y ago

Some bad habits from Java are the only reason. I thought about having obrelease and obgetCString. Putting ob at the beginning of every function seems like overkill, and I don't like that only some functions had would have ob at the beginning if I didn't do a global name change. Any suggestions?

mappu12y ago

> Putting ob at the beginning of every function seems like overkill

That's "C-style namespaces" - really the only sane way to avoid an identifier collision, sorry.

McUsr12y ago

Hello. May I suggest looking for some code by tptacek for the red-black trees. I am looking too, but so far , I haven't been able to find it. I read about it first in this link: <https://news.ycombinator.com/item?id=4455225>.

I think your project looks very interesting. Reference counted datastructures is a great convenience.

Thanks, and Good Luck further.

canadev12y ago

I've been meaning to refresh and expand both my algorithms knowledge and my C programming with a project like this, myself (as an application developer I find I've barely used either since graduating from university). Was that part of your goal at all?

I think this is pretty cool. I like the presence of the tests as well.

theck01OP12y ago

I just graduated, so most of the data structures are still pretty fresh in my mind. I started after getting tired of implementing data structures in my many C based classes, hoping that at least I would get some use out of them during school. Sadly after that I never took another C based class, but the project became interesting in its own right and I stuck with it.

icodestuff12y ago

Definitely forking this to provide a saner API. And some of the implementation is a little iffy (like objIsOfClass). But in general, very cool. Thanks for contributing to the C community!

Any particular reason why you're avoiding the C99 _Bool?

rdtsc12y ago

I personally like and use uthash (no reference counting) but:

* only a header file * has hash, array, string, list, implementations

https://github.com/ned14/uthash

simgidacav12y ago

This recalls me one of my oldest project: http://libdacav.sourceforge.net/

However I didn't do reference counting. Good job OP!

songgao12y ago

star-able url: https://github.com/theck01/offbrand_lib/

pepijndevos12y ago

What are "class[es] built from functions and incomplete types"?

I once built classes using structs and function pointers, but this sounds different?

theck01OP12y ago

An incomplete type is a struct where the definition of the structure is hidden from the view of most of your source code. This makes it impossible to directly access struct elements, and is like data hiding in a object oriented language. Of course in this library you can always include the header containing the struct definition and go bananas so its a soft restriction.

enqk12y ago

I would suggest offering the ability to plug custom allocators.

ambrop712y ago

If you feel a need to do that, check cmccabe's comment on the top about intrusive data structures. These let you do your own memory management, in a way that the data structure implementation doesn't need to know about.

talloaktrees12y ago

Glad to see this.

cmccabe12y ago

It's nice that you made these available. You might also want to look at queue.h from BSD (see http://fxr.watson.org/fxr/source/sys/queue.h) and tree.h (http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/sys/tree.h), single-file "libraries" that can generate a few different kinds of linked list and binary tree. These don't require typecasting at all, since they generate functions for your particular type. They are also intrusive data structures, meaning you control your own allocation.

With regard to OBString: don't. Really. C strings are very flexible and powerful with just what's in the standard library. You don't need charAtStringIndex, for example: you have rindex and index. You don't need splitString; you have strtok_r. You don't need findSubstring; you have strstr.

If you think you need regexes, consider fnmatch, a libc function which matches shell wildcard patterns. It's always there, it's simple, and it's fast. Or you could bring in something like PCRE. In general, if you need full regexes in a small program, it might be worth rethinking whether your program should be in C anyway.

For some reason, I don't find myself using resizable array utility code much in C. It's pretty easy to do it yourself with realloc-- that's really one of the first things they should teach people learning practical C. Perhaps a small, minimal set of macros could wrap the reallocs and make it a little bit nicer.

ambrop712y ago

Intrusive data structures are indeed more powerful. I made my own instrusive AVL-tree which can be found here [1] and an example here [2]. There's also the extra feature that the concept of a "link" is abstracted, so you can for example build a compressed AVL-tree inside an array using array indices instead of pointers, which don't break when the array is reallocated. It's also built in a different way than this usually done (macros), that is, a header file is included which redefines some identifiers, and the actual code of the functions is not inside macros, hence, easier to read.

About strings, I firmly believe that C's zero-terminated strings should be avoided and only used when necessary to communicate with existing code. They're really not simple and fast. You can't take a null-terminated string and extract a null-terminated sub-string without modifying the original - which is very often needed during various kinds of parsing. Also, null-terminated strings are unable to represent the zero byte, so in general, for example, you can't use them to store the contents of an arbitrary file, and if you do use them for purposes where null bytes can appear, you have to be careful and handle errors where nulls would implicitly truncate your string.

Just use (pointer, length) strings instead. It'll save you a whole bunch of trouble you may not see coming.

[1] https://code.google.com/p/badvpn/source/browse/#svn%2Ftrunk%... (CAvl_)

[2] https://code.google.com/p/badvpn/source/browse/#svn%2Ftrunk%... (cavl_test_)

mtdewcmu12y ago

I don't see anything inherently broken about null-terminated strings. It means that the smallest string that can be represented is a single byte, and the largest string that can be represented is unlimited. By mixing in an integer, you're expanding the smallest string and placing an arbitrary limit on the largest string. If the integer and string data are mixed in a struct, then strings pack much less tightly, not just because of the space taken up by the integer, but also because it needs to be aligned. Most of the things you'd want to do with a string are O(n) anyway, like print to the terminal. Null-terminated strings aren't perfect for every use, but no representation is, and C doesn't prevent you from using a different representation when it's more appropriate. If you're writing a parser, then that's a more demanding application than usual, and you ought not feel locked into using the default representation.

mtdewcmu12y ago

"For some reason, I don't find myself using resizable array utility code much in C."

Most of the time, the amount of memory you'll need to do something is predictable. When dynamic arrays are always at hand, you lose the habit and end up always reaching for one.

wrl12y ago

Should you ever need said macros, I've written a set: https://github.com/wrl/wwrl/blob/master/vector.h

cmccabe12y ago

Thanks, that's interesting. I would probably add some kind of callback to free individual elements if I ended up using this. A lot of the boilerplate when managing resizable arrays in C ends up being making sure you properly dispose of elements. Or at least it was for me when I last did this.

1 more reply

theck01OP12y ago

Sadly some form of OBString is a necessity, any string object needs a reference count and some function pointers packaged in the struct to work. Converting OBString to have the same interface as standard string handling functions is a good idea though, and using the standard functions as a backend would also be done in that case.

ExpiredLink12y ago

> You might also want to look at queue.h from BSD

CPP macros, entirely. No, thanks.

pmjordan12y ago

I'm not a fan of macros either, which is why I've ended up implementing hybrid C data structures, with the logic in functions, and some optional type helpers in macros:

https://github.com/pmj/genccont

Anyway, these are largely intrusive and don't do reference counting, but that can be seen as an advantage or disadvantage, depending on the situation. (I use these heavily in kernel code) The hash tables (chaining and open addressed linear probing) use function pointers to be type-generic, which might not be to everyone's taste either.

1 more reply

nitrogen12y ago

Just as an anecdote, in one of my university classes we had the assignment of writing and optimizing a memory allocator. When I switched from using structs to using pointer arithmetic in macros, I gained a significant performance boost, even though the in-memory data was exactly the same.

1 more reply

j / k navigate · click thread line to collapse

40 comments

comex12y ago

Anyway, it's nice to see a portable, modern CF-like library, although of course it's too slow for some cases where code specialization is called for.

jzwinck12y ago

theck01OP12y ago

andrewcooke12y ago

jessedhillon12y ago

Is your project hosted somewhere? I would love to see the source for that, no matter how early/unpolished it is.

andrewcooke12y ago

sure, there's an example at https://bitbucket.org/isti/c-orm/src/4ec76f741d756bf70c2b1b8... but i don't have any text explaining it yet, and it's a bit hard to see what is happening.

    #include "phonebook.corm.h"

in the linked code. then you can do things like

    static int find_name(isti_db *db, const char *text, name** name) {
      corm_name_select *select = NULL;
      *name = NULL;
      STATUS;
      CHECK(cname.select(&select, db));
      CHECK(select->name(select, "like", text)->_go_one(select, name));
      EXIT;
      if (status == ISTI_ERR_NO_RESULT) status = ISTI_OK; // see NULL name
      if (select) status = select->_free(select, status);
      RETURN;

}

the API for the database backend is isolated out, but the only current implementation is sqlite.

the complete repo is at https://bitbucket.org/isti/c-orm/src (if you download it and run doxygen you should see better docs - but they're incomplete and not online yet)

canadev12y ago

What is struct? Or do you mean the C struct type <-> SQL? Like an ORM for C?

andrewcooke12y ago

yup, exactly - see link above.

obiterdictum12y ago

Along the same lines, there is also CCAN.

http://ccodearchive.net/list.html

https://github.com/rustyrussell/ccan

And GMP for arbitrary precision math (integers and floating point numbers).

http://gmplib.org/

huhtenberg12y ago

I find that the decision to adopt a C library for a project frequently boils down to whether it is using a compatible naming notation. Especially for simpler libraries.

mappu12y ago

Was there a rationale to choosing performActionOnStruct() names instead of the perhaps-more-idiomatic struct_perform_action() style?

I strongly advise against putting identifiers like release() and getCString() in the global namespace, that's probably not the wisest idea if you plan to use libraries other than your own.

theck01OP12y ago

mappu12y ago

> Putting ob at the beginning of every function seems like overkill

That's "C-style namespaces" - really the only sane way to avoid an identifier collision, sorry.

McUsr12y ago

I think your project looks very interesting. Reference counted datastructures is a great convenience.

Thanks, and Good Luck further.

canadev12y ago

I think this is pretty cool. I like the presence of the tests as well.

theck01OP12y ago

icodestuff12y ago

Definitely forking this to provide a saner API. And some of the implementation is a little iffy (like objIsOfClass). But in general, very cool. Thanks for contributing to the C community!

Any particular reason why you're avoiding the C99 _Bool?

rdtsc12y ago

I personally like and use uthash (no reference counting) but:

* only a header file * has hash, array, string, list, implementations

https://github.com/ned14/uthash

simgidacav12y ago

This recalls me one of my oldest project: http://libdacav.sourceforge.net/

However I didn't do reference counting. Good job OP!

songgao12y ago

star-able url: https://github.com/theck01/offbrand_lib/

pepijndevos12y ago

What are "class[es] built from functions and incomplete types"?

I once built classes using structs and function pointers, but this sounds different?

theck01OP12y ago

enqk12y ago

I would suggest offering the ability to plug custom allocators.

ambrop712y ago

talloaktrees12y ago

Glad to see this.

cmccabe12y ago

ambrop712y ago

Just use (pointer, length) strings instead. It'll save you a whole bunch of trouble you may not see coming.

[1] https://code.google.com/p/badvpn/source/browse/#svn%2Ftrunk%... (CAvl_)

[2] https://code.google.com/p/badvpn/source/browse/#svn%2Ftrunk%... (cavl_test_)

mtdewcmu12y ago

"For some reason, I don't find myself using resizable array utility code much in C."

Most of the time, the amount of memory you'll need to do something is predictable. When dynamic arrays are always at hand, you lose the habit and end up always reaching for one.

wrl12y ago

Should you ever need said macros, I've written a set: https://github.com/wrl/wwrl/blob/master/vector.h

cmccabe12y ago

1 more reply

theck01OP12y ago

ExpiredLink12y ago

> You might also want to look at queue.h from BSD

CPP macros, entirely. No, thanks.

pmjordan12y ago

I'm not a fan of macros either, which is why I've ended up implementing hybrid C data structures, with the logic in functions, and some optional type helpers in macros:

https://github.com/pmj/genccont

1 more reply

nitrogen12y ago

1 more reply

j / k navigate · click thread line to collapse