The reality is that most applications and websites don’t expose enough context about the what of what you’re actually doing for AIs to be able to meaningfully infer from natural language the steps required to complete a given task.
We humans are very good at filling in the blanks based on if we’re working in Photoshop or VS Code or Excel. We infer a lot of context from the specific files we’re working on or the particular client or even the files’ organization within the file system, or even what month or day it is.
I am skeptical that models will be able to replicate a complex workflow when there’s very little in the way of labels and UI controls even visible.
I know a weekly spreadsheet from a monthly and quarterly, etc. I know the minutiae about which options to use to generate the specific source reports, etc.
Workflows can be quite complex, no matter your role.
I mean I can just see it now: gift receipts being sent to the recipient before their birthday, internal draft proposals prematurely sent to clients, mixing up clients or commingling their data, overwriting or losing data; this whole thing just screams disaster. And I’m not even thinking about people involved with safety, or finance, or legal/regultory, or medical. Law enforcement?
This kind of thing can be done properly with well defined interfaces, common standards, and reasonable and prudent guardrails.
But it won’t be. It’ll be YOLOed on a paper thin training budget and it’ll be like your own little personal chaos monkey on ketamine.