As one myself LLMs don't feel like giving up control it feels like collaboration. You probably don't like collaboration.
When I interact with an LLM it feels like I'm speaking to someone who lacks inherent conceptual understanding, but is doing their utmost best to bullshit me. This is problematic when it comes to algorithmic thinking.
I sometimes need to solve problems that involve tweaking existing algorithms in somewhat novel ways. For example "k-medoid clustering but with various capacity constraints on clusters", or "contraction heirarchies but accounting for varying vehicle dimension restrictions".
I love collaborating with smart colleagues to come up with ideas to solve these problems. Seeing chat-gpt fail to solve very simple traveling salesman type problems gives me little confidence that it can do novel conceptual tihnking.
EDIT
To test my own claims, I just tried asking Chat-GPT to make this modification to k-means clustering. It produced something that will give a result but will likely not be very optimal. When prompted to refine the solution, it failed to understand the issue and instead proposed the exact same solution just with a bit more implementation detail.
User
Suggest a modification to the k-means algorithm where the number of elements assigned to each cluster is limited to a pre-defined maximum value.
ChatGPT
To modify the traditional k-means algorithm to limit the number of elements assigned to each cluster to a pre-defined maximum value, you can create a variation of k-means called "k-means with cluster size constraints." This modification ensures that no cluster exceeds the specified maximum size while still optimizing cluster centroids based on the sum of squared distances. Here's a high-level overview of the modified algorithm:
1 Initialize centroids: Start by randomly selecting K initial centroids, where K is the desired number of clusters.
2 Assign data points to clusters: Assign each data point to the nearest centroid based on Euclidean distance, just like in the traditional k-means algorithm.
3 Update centroids: Recalculate the centroids of each cluster by taking the mean of all the data points assigned to that cluster.
4 Check cluster sizes: After updating the centroids, check the size of each cluster. If any cluster exceeds the pre-defined maximum size, proceed to reassign some data points from that cluster to other clusters. Here's how you can do this: Identify the cluster(s) that exceed the maximum size. For each over-sized cluster: Sort the data points in that cluster by their distance to the cluster centroid in ascending order. Start reassigning the farthest data points to other clusters (the ones with the smallest distances) until the cluster size constraint is satisfied. Update the centroids of the affected clusters accordingly.
5 Repeat steps 2 and 3: Continue iterating between data point assignment and centroid updating until convergence, or until a predefined number of iterations have been reached.
6 Termination: The algorithm terminates when either the cluster assignments do not change significantly between iterations or after a fixed number of iterations.
Here are a few things to consider when implementing this modification:
You will need to experiment with the order in which you reassign data points from over-sized clusters to minimize the impact on the overall clustering quality.
You should set an appropriate stopping criterion to prevent the algorithm from running indefinitely.
Be prepared to handle situations where it is impossible to meet the cluster size constraint while maintaining good cluster quality. In such cases, you may need to relax the constraint or consider alternative clustering methods.
This modified k-means algorithm with cluster size constraints allows you to control cluster sizes while still finding meaningful clusters in your data.User
Your proposed solution has an issue that points might be reassigned even if there are clearly no other suitable clusters. Please tweak the solution to avoid this issue.
ChatGPT
To address the issue of potentially reassigning points even when there are no other suitable clusters, we can modify the approach by introducing a priority queue or heap data structure to efficiently manage the reassignment process. Here's an improved version of the k-means algorithm with cluster size constraints:
1 Initialize centroids: Start by randomly selecting K initial centroids, where K is the desired number of clusters.
2 Assign data points to clusters: Assign each data point to the nearest centroid based on Euclidean distance, just like in the traditional k-means algorithm.
3 Update centroids: Recalculate the centroids of each cluster by taking the mean of all the data points assigned to that cluster.
4 Check cluster sizes: After updating the centroids, check the size of each cluster. If any cluster exceeds the pre-defined maximum size, proceed to reassign data points from that cluster to other clusters. Here's how you can do this: Identify the cluster(s) that exceed the maximum size. For each over-sized cluster: Create a priority queue (or heap) to store the data points in the cluster, ordered by their distance to the cluster centroid in ascending order. Start reassigning the farthest data points to other clusters (the ones with the smallest distances) until the cluster size constraint is satisfied. Update the centroids of the affected clusters accordingly. Continue this process until either the cluster size constraint is met or there are no more suitable clusters to reassign points to.
5 Repeat steps 2 and 3: Continue iterating between data point assignment and centroid updating until convergence, or until a predefined number of iterations have been reached.
6 Termination: The algorithm terminates when either the cluster assignments do not change significantly between iterations or after a fixed number of iterations.
This modified approach ensures that points are only reassigned to clusters that are suitable and have room to accommodate additional points while maintaining the overall quality of the clustering solution. The use of a priority queue helps prioritize the reassignment of points to clusters with the least impact on their quality.
Additionally, you should set an appropriate stopping criterion to prevent the algorithm from running indefinitely, and you may need to handle situations where it is impossible to meet the cluster size constraint while maintaining good cluster quality by relaxing the constraint or considering alternative clustering methods.
Also what you're doing here is asking chatGPT for the answer. chatGPT is more effective via collaboration. Meaning instead of shoving the entire problem down it's throat and asking it to solve it, you ask it for advice. Ask it for bits and pieces of things.
A lot of front-end development for example does not require familiarity with any algorithms, formulae or mathematical structures. You might need to reason about large systems and have a rough idea of when hashmaps are useful, but the bulk of the work is constructing an interface according to functional business requirements. I frequently see comments here along the lines of "why am I being interviewed about algorithms when I'll never use them in my job".
A more mathematically oriented developer may be in the business of modelling and predictions. They may be confronted with a novel real world problem involving traffic, or trading, or electricity networks, that potentially no one has tried to solve before. They may be required to find a mathematical structure that closely approximates real world behaviour, implement that structure via code, and deploy a continuous model which allows their client to analyse and project.
Of course, you also have academic mathematicians using software like Maple or SageMath to assist with their research. This is another level more mathematical. Perhaps what you're getting at is that people can ask ChatGPT questions like "write me some Sage code to get the Delaunay triangulation of this set of points". I totally agree that it can probably do well at these tasks.
You also talk about things like traffic. Modelling traffic is mathematical? Sounds like a simulation to me. Man take a look at GTA. That game is almost entirely made by builder engineers creating simulations. It's the same shit and likely far more advanced then what any data scientist can come up with.
Anyway from your example and from what Ive seen it sounds like you're still doing the same thing. CS algorithms. You're just using algorithms that aren't likely very popular or very specific to data and stats. But adjusting stuff like k-means clustering still sounds like regular cs stuff to me.
There's no point in calling it more "mathematical" because it's not. The builder engineer who wrote all the systems in GTA or even say red dead redemption use a ton of "math" even and they don't term themselves more "mathematical" even though their simulations are likely more complex than anything you will ever build.
That's why when you called your self mathematically superior (again don't deny this.. we all know what you really mean here) I thought you were talking actual math. Because if you looked at a math equation it doesn't look anything like an algorithm. Math equations are written as a single expression. Math equations model the world according to a series of formula. It's very different to a cs algorithm.
Mathematical oriented programming involves largely the same thing and using algebras of mathematics.
If you're not doing this just call it data science instead of trying to call yourself more "mathematical". If you truly were more mathematically oriented you would know what I'm talking about.
Geeze some guy writing "models" and doing some applied math+stats like what every other freaking programmer out there is doing and he calls himself more "mathematically oriented."