Yours looks quite more finished, one thing that would be very useful is handling Python to Postgres type conversion, like PL/Python does. After that, the next step is caching I/O functions for the duration of the scan, which PL/Python also does.
Nice to see that the idea made sense for more than one person, hope Multicorn will rock!
We currently have some very rudimentary python to postgres type conversion, but this area still need a lot of improvements.
You should release your code, I'm sure you have a wide range of ideas worth merging into Multicorn !
But what kind of performance is it possible to get with complex queries on these external sources? Does the PG server need to load all the data in memory from the source to do filters, joins and sorts?
The postgresql plan is parsed, and passed as a list of "quals", objects representing simple filters. As an implementer, you don't HAVE to enforce those, since postgresql will recheck them for you anyway, but they can be quite handy.
For example, you can look at the imap foreign data wrapper (https://github.com/Kozea/Multicorn/blob/master/python/multic...) to see how the conditions from postgresql are converted to an IMAP filter, allowing for server side filtering.
The required columns are also provided, so if you don't need the email payload, the foreign data wrapper will not fetch it.
For joins, it will depend on the execution plan. There is still plenty of work on parsing the postgresql execution plan into something more useful, but the current set of "optimizations" is sufficient for our main use cases.
gcc -g -O2 -fPIC -fPIC -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv -g -lpythonpython2: -fpic -L/usr/lib -Wl,-Bsymbolic-functions -Wl,--as-needed -Wl,--as-needed -Wl,--as-needed -lpythonpython2: -shared -o src/multicorn.so src/multicorn.o /usr/bin/ld: cannot find -lpythonpython2: /usr/bin/ld: cannot find -lpythonpython2: collect2: ld returned 1 exit status make: * [src/multicorn.so] Error 1 rm src/multicorn.o ERROR: command returned 2
This is because Ubuntu have - python and python2.7 bin files. I fixed this by creating symlink on python.
After instalation another problem:
$ psql psql (9.1.1) Type "help" for help.
leo=# CREATE EXTENSION multicorn; ERROR: could not load library "/usr/lib/postgresql/9.1/lib/multicorn.so": /usr/lib/postgresql/9.1/lib/multicorn.so: undefined symbol: _Py_NoneStruct
All tested on specialy created for this system: $ python -V Python 2.7.2+ $ psql -V psql (PostgreSQL) 9.1.1 contains support for command-line editing
On this my tests end.
PS: if you're the 'leopard' who requested a redmine account, it should be activated now, feel free to report it there.
Will this work?
I was thinking in downloading the whole thing via POP3, creating a Unix mailbox and indexing that.
What is your use case, exactly ?