I have built a different table library for jupyter called buckaroo. My approach has been different. Buckaroo aims to allow you to interactively cycle through different formats and post-processing functions to quickly glean important insights from a table while working interactively. I took the view that I type the same commands over and over to perform rudimentary exploratory data analysis, those commands and insights should be built into a table.
Great tables seems built so that you can manually format a table for presentation.
What are your thoughts on Visidata's hotkeys and controls? I used Visidata in the past and always wondered why it couldn't be added into Jupyter (eventually) for dataframe explorations.
>It looks like they are almost building a "grammar of tables" similar to a grammar of graphics.
Agreed that Great Tables seems to be taking annother crack at formalizing a "grammar of tables", and I welcome this approach too given the power of tabular formats and wider adoption of the dataframe concept via the R/pandas/Arrows/polars ecosystem, although I believe the term was initially referred to in the 90s[1] from the statistical S language.
[1] https://towardsdatascience.com/preventing-the-death-of-the-d...
The other feature I have played with in this area is auto-cleaning. Auto-cleaning looks at individual columns and emits cleaning commands to the low-code UI. Different cleaning strategies can be implemented and toggled through.
Buckaroo takes the view that being opinionated is good, so long as you can toggle through opinions to get the right combination of cleaning, display, or post-processing that you are looking for quickly. All of the features of buckaroo are also built to be easily extendable by users.
This feature saw very little use, so I haven't developed it much (I had to disable it after some refactorings). The lowcode UI is demonstrated at the end of the youtube video linked above.
The top and bottom horizontal rules on the Title appear to be superfluous, and I dislike how it is aligned with the first column (row labels) rather than the second. I feel like a little space to breath at the bottom, along with a bold font would add visual hierarchy w/o the clutter.
The row label backgrounds are far too dark and the font weight makes it hard to read. I'd prefer a very light blue here instead. I don't like the row group label ("Name") being italicized.
The spanner labels floating in the centre make the table hard to scan. Would be much nicer aligned left.
Finally, I really dislike the font (maybe this is just my browser, though).
I mocked-up some of the changes here, I think this is a much easier to read table:
If you've seen sparklines, [2] Tufte coined the term.
Whenever I do a UI review I end up paging through it just to see if there's something we're not thinking about, and its an interesting book to just open to a random page and read.
Plus he has an entire treatise on why PowerPoint is terrible.
As someone trying to build a PowerPoint competitor, this is awesome. I'm going to start here and work my way through his whole corpus
In contrast the census manual chooses to center almost all labels within their box, and when not it is almost always due to indentation, and moreover is unafraid to set column widths to fit the data not the labels, with indent and hyphenation to match. The result is both horizontally compact and intuitively comprehensible.
edit: on further reflection I also think it’s a crappy title. Titles and captions should convey context, scope, purpose - and may otherwise be omitted entirely for the editorial sin of failing to justify their own existence. As given, this one could be retitled “Table 1” with no loss of information or generality. For an article that’s trying to discuss and reformulate tabular presentation from first principles, that’s a tad disappointing. Since table titles form a crucial layer of their information catalogue, it is hardly surprising that the census manual devotes an entire chapter to the matter of title construction, and even though somewhat domain specific and archaically worded it is well worth the visit
More recent history involves the production of CALS tables https://en.wikipedia.org/wiki/CALS_Table_Model. The company Datalogics https://en.wikipedia.org/wiki/Datalogics was heavily involved in the CALS table initiative. Datalogics staff was part of the ISO committee forming SGML, and trained many people on SGML, including DoD staff and their contractors involved with documentation.
I was involved with the team that produced an editor for SGML-based documents. It had as one of its features the ability to specify the formatting of an element based on the SGML context of that element. This was before XSLT and its kin.
Alumni of Datalogics helped Microsoft learn about XML ("No, you can't arbitrarily switch case on XML element tags").
Also TeX practitioners have pretty well-formed opinions about how tables should be formatted.
Odd side-note: I learned that the documentation for a fighter airplane of the time, if printed out, would weigh more than the aircraft and would fill a football-field sized collection of filing cabinets.
And as much as many today don't like XML, coming from the SGML world it is a boon.
Indeed. I don’t think it’s all correct, though. On Visicalc, it says “The grid cells couldn’t be styled with borders for presentation purposes, the values couldn’t be formatted, and the tables couldn’t even be printed”
I think I even the first version had (limited, of course) formatting support.
http://www.bricklin.com/history/refcard3.htm says the “/F” command allowed setting justification and setting the number format to, for example, dollars and cents.
It is for version 1.35, but I think even the first version shipped supported at least showing dollars and cents.
While the article is overall good, there is a bunch of history that is not covered, or covered too briefly.
E.g.:
Cost
$1500
$130
$110
$210
The text in the last three rows look 4/5ths the size of the text in the first row. However, even if summed, the last three costs add up to only 1/3rd of the top row! People visually see the number digits, which is roughly the same as Log 10.I’ve so often had this issue that I started putting in-cell bar charts into every finance-related spreadsheet.
Otherwise meetings will get derailed debating the cost of something trivial that is totally irrelevant compared to the biggest absolute costs.
As a real example, I had many meetings spent debating a $15 monthly cost for server log collection in the cloud for a VM running a database engine that costs $15K monthly for the license alone.
I just wanted to say that Rich is the only software developer I know, who when asked to lay out the philosophy of his package, would give you 5,000 years of history on the display of tables. :)
It makes me wonder how we've gone this long with increasingly poor data table presentations (the mid-century modern tables are astutely pointed to as shining examples).
This makes me excited to get back into data analysis with python. Moreover, I see some possible API improvements and extensions I'd like to make.
The designers of Great Tables might want to check out TPL. It covers everything Great Tables aims to do, and I think may have a few more tricks up its sleeves:
https://www.ojp.gov/pdffiles1/Digitization/68013NCJRS.pdf
Regardless, thanks for making Great Tables! This goes a long way towards making table producing in python much better.
People from Show HN - watch and learn.
Interesting aside: AI models trained on spreadsheets need "good tables" such as column names, headers, etc. to understand context. Like Fortap: https://arxiv.org/abs/2109.07323
I just made a table this morning for Calc II notes. The first column says something like $f'(x)$ in the first row and $f'(0)$ on the second. The table body lists values for different functions, one per column. I put in a column rule separator because the leftmost column seems separate from the others.
In any event, I'm suspicious of rules (pun noted).
And they all suck.
I don’t want to have to use some JS library component just to show tabular data especially given how badly they perform one big - but a server side rendered HTML table can be enormous and render fine. But again, so limited.
The next performance gain web tables comes from using a binary encoding instead of JSON, particularly arrow. Perspective uses arrow (in addition to rendering to canvas).
IME building buckaroo on top of ag-grid, I can render the table with up to about 300k elements very performantly with just JSON. Rendering speed is a non factor because only 50 rows are rendered at a time. Moving to arrow-js should be about 3 times faster for the entire system (python serialize, js deserialize, js render). Beyond 900k elements, you really want to lazily load from the server as the user scrolls. The memory usage for just the data in the browser tends to slow things down. (I am working on a library and benchmark for different serialization techniques).
such libraries often mess the scrolling and searching up
Due to their regular structure, tables would provide an opportunity for HTML implementations to optimize and greatly reduce that memory usage.
- now the dates are not vertically aligned, and
- the weights have repetition of units (lbs) (although losing the decimal is an improvement)
- the name column is way too visually "heavy", that's a style you'd reserve for a header, a simple bold would suffice
They've also retained other issues from the original like city not being a city, but a combo of city and region, or similarly no separation of the first name of a person, which is especially important for a diverse group of people
There's also of course LaTeX (mentioned in a couple other comments here), which has "ordinary" tables and long tables (tables that span more than one page).
Hope to see dplyr and ggplot someday on Python.
Tables, when you really care, are so very difficult to get right. Sometimes you really want to densely compact the data to communicate all details to others who are already deeply invested in the dataset, sometimes you need to remove all but the most essential information to get across a single idea with clarity, and then there are a continuum of variations between those extremes. The problem expands into more dimensions when the media becomes a consideration-- you simply can't (and should not) use the same approach when dealing with pdf's vs html vs a slide-deck. On top of all that, you often have a personal style that you want to get across, or have a style that you need to comply with in some inflexible way.
I like how gt just NAMES the parts of table in their docs (and in the schematic in the article). This is a problem where agreeing on what things are called makes a ton of difference in usability.
https://people.inf.ethz.ch/markusp/teaching/guides/guide-tab...
One thing in particular I'm interested in but could not see an example for is if this will let you insert "break lines" i.e. for displaying sub totals and similar.
For example based on the demo, which shows names and addresses from census data it might be nice to be able to break at each change in postcode and display some summary data like a count of people found at that postcode or an average age (based on the DOB) living at that postcode or similar.
Otherwise conditional formatting is another pain point either using rules i.e. if value in column B is greater than a specified threshold make the entire row bold. Or automatically creating a color gradient to highlight the cells ala Excel.
For bonus points management types like things like red and green traffic lights (or down/up arrows) you can display next to kpi data in a table It's a gimmick but wins you points.
(Disclosure: Quarto dev here) ..., like Quarto. You can use `great_tables` in code cells in Quarto to get great tables in your RevealJS presentation or website, https://quarto.org/docs/output-formats/html-code.html.
It would be nice to add some interactivity features to the tables, like ActiveAdmin in Rails.
However, this article gave me some really interesting and valuable background material and then concluded with solid, crystal clear (even to me, who's never written python) examples. I actually came away thinking, "well, this look fun - I should spin up some toy project to play around with this and learn how to use it."
You might read the famous Edward Tufte book, https://www.edwardtufte.com/tufte/books_vdqi
Creating beautiful tables both UI and UX wise, with some features being e.g dropping separator between columns (row?), doing some visual accents, etc.
And yes, the most distinguished feature was that the tables weren't looking like your busy PowerPoint non-tech organisation stuff, they were very modern yet simple.
I don't remember the specifics, but I was really impressed and regret not bookmarking the article.
Does anyone know of the article in question, and maybe could share the link?
Many websites build tables out of div's, and they may look like tables, but they are hard to manipulate/export data from.
If a table is on a website, I feel it should be easy to export it as csv, at the least, to "free the data" :D
[0] http://mirrors.ctan.org/macros/latex/contrib/booktabs/bookta...
All approaches I’ve seen here have some issues; worst of all being “press shift to sort by multiple columns” (not touch friendly).
Of course, this isn’t intuitive in the sense of being self-evident, but once you know this method, it basically works almost everywhere.
This article is about a Python library called “Great Tables” that is focused on the display of tables for publication and presentation (not for interactive browsing).
The article does not specify which output format it supports.
Also you get some bonus historical context on tables.
I'm one of the maintainers of Evidence (open source tool based on markdown + SQL) and working on a similar approach to creating presentation tables configurable in code.
Some examples here for any SQL + table enthusiasts: https://docs.evidence.dev/components/data-table
Also, in documents all images and tables should have descriptive captions. So their header with title and subtitle would be redundant.
Perspective seems to be the most performant html table. It is more focused on extremely fast updates than styling, although it looks good.
Glide is a newcomer that also renders to canvas.
> confronted with an all-too-familiar dilemma: copy your data into a tool like Excel to make the table, or, display an otherwise unpolished table.
One add-on (coming from the past 4 years of working on a tabular-data from Pythons startup [1]) is that users aren't just copying data into Excel because if it's good formatting capability: very often, there are organizational constraints that mean that Excel _needs_ to be where this data ends up.
The most common reasons I've seen for data ending up in Excel: 1. Other parts of the report rely on Excel features - you want to build pivot tables or graphs in Excel (often, these are much easier to build in Excel than in Python for anyone who isn't a real Pythonista) 2. The report you're sending out for display is _expected_ in an Excel format. The two main reasons for this are just organizational momentum, or that you want to let the receiver conduct additional ad-hoc analysis (Excel is best for this in almost every org).
The way we've sliced this problem space is by improving the interfaces that users can use to export formatting to Excel. You can see some of our (open-core) code here [2]. TL;DR: Mito gives you an interface in Jupyter that looks like a spreadsheet, where you can apply formatting like Excel (number formatting, conditional formatting, color formatting) - and then Mito automatically generates code that exports this formatting to an Excel. This is one of our more compelling enterprise features, for decision makers that work with non-expert Python programmers - getting formatting into Excel is a big hassle.
Of course, for folks who can ditch Excel entirely, this is entirely unnecessary. Great Tables seems excellent in this case (and anyone writing blog posts this good is probably writing good code too... :) )
[2] https://github.com/mito-ds/mito/blob/dev/mitosheet/mitosheet...
I've had success generating svg visuals and placing them in slides, which PPT treats as a "shape" (the Graphics Format ribbon appears), and business users like that they can modify the shapes (for example, change the color). Great Tables supports pdf export, but not svg. I just tested a pdf vector in the current version of PPT, and while it maintains the vector, PPT won't let me convert it to a shape (only the Picture Format ribbon is available). Great Tables doesn't seem to support svg export directly, so there needs to be an additional pdf -> svg conversion.
Unfortunately, the API design in the example is just not very good:
(
GT(simple_table, rowname_col='Name')
.tab_header(title='Names, Addresses, and Characteristics of Remote Correspondents')
.tab_stubhead(label=md('*Name*'))
...
)
I'm uncertain if it's trying to mimic something in another language like R (or some grammar of graphics thing or D3.js.) Hopefully, it's not trying to mimic the look of long, chained `pandas.DataFrame` operations (because it misses the point of why those look the way it does.)Of course, for ad hoc, in-a-notebook, cut-and-paste/written-from-scratch use, the API design doesn't really matter that match. Usually, users will readily memorise the required incantations then fiddle with the result until they get what they want or they give up.
It's probably the case that for most tools that produce visual outputs, a majority of users are creating things in this style. (There are, e.g., millions of casual Matplotlib users out there.) But programmatic use is not too far off. Tools that produce visual outputs (even those as formally rigidly at display tables,) are often subject to consistency requirements, which directly implies programmatic use.
So, when I discover that my colleagues and I have six tables across three notebooks that need a consistent look, and I decide to interact with this tool programmatically, am I expected to write…?
def standard_table(source, /, rowname_col, header_title, stubhead_label, weight_columns):
return (
GT(source, rowname_col=rowname_col)
.tab_header(title=header_title)
.tab_stubhead(label=md(f"*{stubhead_label}*"))
.fmt_integer(columns=weight_columns, pattern="{x} lbs")
...
)
standard_table(simple_table, rowname_col='Name', header_title='Names, Addresses, and Characteristics of Remote Correspondents', stubhead_label='Name', weight_columns='Weight')
Or maybe…? def format_table(weight_columns):
return (
tbl
.tab_stubhead(label=md(f"*{tbl.stubhead.label}*")) # what if not present?
.fmt_integer(columns=weight_columns, pattern="{x} lbs")
...
)
format_table(
GT(simple_table, rowname_col='Name')
.tab_header(title='Names, Addresses, and Characteristics of Remote Correspondents')
.tab_stubhead(label='Name')
...
)
Or maybe…? class StandardTable(GT):
def tab_stubhead(self, *a, **kw):
# inspect.signature.bind(...) # ...
return super().tab_stubhead(*a, **kw)
StandardTable(...)
These aren't great options. The API design is just not very good.Just look at the first example on nushell frontpage: https://www.nushell.sh/ could that not look better with "Great Tables" or something similar?
Strange that they did not know (or credit) booktabs, the LaTex package that popularizes this table design since 2003.
“The democratization of computational tables arguably began with VisiCalc in 1979… I mean, try it out and you’ll see that this is quite limited in more than a few ways.”
Them’s fightin’ words. IMHO VisiCalc’s ability to generate models quickly changed civilization. It freed people to try out ideas at no cost and to view or manipulate data in ways no one could hope to do before.