I mean we didn't read a classic American author till 6th or 7th grade! And if I recall correctly there were still M&M's in math class in grade 4!
The US may have an education problem but somehow the Soviet Union and China did fine years ago with out all the ed-tech snake oil.
citation?
Education is a complex matter. There are many people with OPINIONS on what the best way to teach is. These ideas are in conflict and only rarely does anyone study what really works. (rarely compared to the number of opinions - there could be a lot of studies that nobody knows about when they state their opinion)
Humans have a limited lifetime: you cannot teach all possible useful knowledge/skills in a lifetime. I limited this to useful, there is a lot of useless things that are fun to know anyway, somehow those are are interested need time to learn it for fun. I didn't define useful either: is Music/French/Algebra/Sports... useful (I can make either argument for any subject)
Why is reading a classic American author important? Reading is important in an abstract sense, but if you can understand written instructions it doesn't matter what you happened to read to get that skill.
Likewise, what is wrong with using M&Ms for learning math? a concrete example helps to learn. (to be clear, this is an opinion that I was ranting against in the first paragraph - I don't know if I agree with the opinion but I understand it enough to repeat it)
One constant in the US in popular culture is our education system sucks compared to X. We have done well over the years despite that (or maybe because of it?)
http://www.businessinsider.com/pisa-worldwide-ranking-of-mat...
> I didn't define useful either:
> but if you can understand written instructions it doesn't matter what you happened to read to get that skill.
Of course you are free to define useful in a way that makes it impossible to argue or to have a discussion. So let's stick to the way it is defined for the purpose of say University admission.
> Why is reading a classic American author important?
Reading difficult work earlier develops higher reading comprehension faster.
> Likewise, what is wrong with using M&Ms for learning math?
I think if by 4th grade you still need concrete pieces to understand integers or denominators of a fraction or whatever they were supposed to represent, that is a sign of a weak math education. In general concrete examples are antithetical to learning advanced math, this leads do the monkey-style ability to solve problems that are similar ones presented in textbooks, but not the ability to reason effectively about an unfamiliar problems.
It helps that your graduate schools and corporations are full of people educated in other countries. Immigration is great.
This is because just pumping money into failing schools does not magically turn them around. There is little correlation between per capita secondary education spending and student outcomes.
Of course funding per pupil isn't correlated to outcomes. Funding per pupil normalized to their levels of needs and preparedness might be.
Once a certain amount of dollars are actually reaching the class room, adding more dollars will simply see most of the additional funds absorbed by hiring more administrators, prestige projects like sports facilities, "classroom technology" projects etc.
To detect this limit, simply check the level at which teachers begin paying for school supplies for their students from their own pockets and then back it off about 10%.
Surely "add on about 10%"?
In the Northeast US, you'll generally see the best performing districts have a lower amount spent per child than the underperforming districts.
The underperforming districts will have higher property taxes (as a result of the higher education cost). This generally leads to parents seeking to move to a different school district for financial and educational reasons.
In education, at least, more money does not equate to better students, but instead, more mismanagement.
This definitely needs a citation. It might not have significant correlation either way, but I cannot find a reference for the former (some cursory googling [0][1]).
[0] https://www1.udel.edu/johnmack/research/school_funding.pdf [1] https://object.cato.org/sites/cato.org/files/pubs/pdf/pa746....
You give a gifted student a $100 book and let them get after it.
You give a troubled behavior student with multiple LDs a full-time ed tech at $30k per year salary minimum, or whatever else is required, by federal law, to fulfill their IEPs.
Anyway, as I say again and again: there isn't one US education system. Within the District of Columbia, a populous but geographically small area, there are practically if not legally speaking six or seven at least: public schools, magnet; public schools prosperous; public schools shaky to desperate; parochial schools; private schools; charter schools. And within the parochial, private, and charter school worlds there are considerable differences.
This kind of data is commonly modeled using item response theory (IRT). I suspect that even in data generated by a unidimensional IRT model (which they are arguing against), you might get the results they report, depending on the level of measurement error in the model.
Measurement error is the key here, but is not considered in the article. That + setting an unjustified margin of 20% around the average is very strange. An analogous situation would be criticizing a simple regression, by looking at how many points fall X units above/below the fitted line, without explaining your choice of X.
The main point of this post is to highlight that the most common metric of student performance may not be that useful. Most of the time, students will get their score, the average score, and sometimes a standard deviation as well. As jimhefferon mentioned in a response to a different comment, the conventional wisdom is that two students with the same grade know roughly the same stuff, and that's seeming not to be true.
We're hoping to build some tools here to help instructors give students a better experience by helping them cater to the different groups that are present.
disclaimer: I'm one of the founders of Gradescope.
However, I'd say that the issue is more than having a non-rigorous analysis. It's the wrong analysis for the question your article tries to answer. In the language often used in the analysis of tests, your analyses are essentially examining reliability (how much do student's scores vary on different test items due to "noise"), rather than validity (e.g. how many underlying skills did we test). Or rather, they don't try to separate the two, so cannot make clear conclusions.
I am definitely with you in terms of the goal of the article, and there is a rich history in psychology examining your question (but they do not use the analyses in the article for the reasons above).
The piece is kind of making a basic fundamental mistake in measurement, assuming that all variability is meaningful variability.
There are ways of making the argument they're trying to make, but they're not doing that.
Also, sometimes a single overall score is useful. A better analogy than the cockpit analogy they use is clothing sizing. Yes, tailored shirts, based on detailed measurements of all your body parts, fit awesome, but for many people, small, medium, large, x-large, and so forth suffice.
I think there's a lesson here about reinventing the wheel.
I appreciate the goals of the company and wish them the best, but they need a psychometrician or assessment psychologist on board.
We aren't trying to make a rigorous statement here -- we're trying to draw attention to the fact that the most common metrics do not give much insight into what a student has actually shown mastery of. This is especially important when you consider that the weightings of particular questions are often fairly arbitrary.
I certainly agree that all variability is not meaningful variability, but I'd push back a bit and say that there's meaningful variability in what's shown here. We'll go into more depth and hopefully have something interesting to report.
I've also seen a fair number of comments stating that this is not a surprising result. I'd agree (if you've thought about it), but if you look at what's happening in practice, it's clear that either many people would be surprised by this, or are at least unable to act on it. We're hoping to help with the latter.
All the worst students will be very similar and all the best students will be very similar because the number of available states is low. Average students are all unique in their average-ness.
Am I missing some subtle statistical understanding that the toy example doesn't capture?
I wondered about a very similar problem some weeks ago. I was bothered about the terms "ectomorph" and "mesomorph" because they seemed useless once you considered height: the vast majority of "ectomorphs" seemed to be taller than the average while the vast majority of "mesomophs" seemed to be of average height, so there's no point to these words. And so I wondered how would shoulder width would change given height (which seems to have some kind "decreasing returns"), and how the average measures would relate to actual average build. I mean, is the "average guy" really the guy with the average height and average shoulders? Because it's not as if the scale had just changed, like doubling the size of a cube, but there seems to be some deformation going on as well.
Anyway, didn't get past the wondering phase at the time. But I think it's too much of an important problem to be casually thrown as part of a pitch. I don't see an immediate reason why the average tuple should be the tuple of all averages, because some of the variables might be "dislocated" and thus not coincide with the averages of other variables. Some guy might be very close to average height yet still somewhere in the left-tail when it comes to body mass, shoulder width or any other measure. So there might be a typical student, but I don't think this is the way to find him.
Take the simple case of 2 dimensions (each observation is plotted in 2D space) with possible values of 0-10. Let's say the extreme (far from average) space is within 5% of the border. The total extreme area is (10x10)-(9x9) = 19 (i.e. 19%). Now add a 3rd dimension. The extreme "volume" in 3d space is now (10x10x10)-(9x9x9) = 271 (i.e. 27%). You can see where this is trending. Add enough dimensions, and every observation is now "extreme." They become so far apart that each observation almost deserves its own cluster, and you lose any idea of similarity.
Back to this particular article: when you _add_ (or average) all of the dimensions -- like you do on an exam -- suddenly they are close again.
According to the article, the average person doesn't exist, either. I don't know many people that are 13% fluent in Mandarin, 13% fluent in English, 9% fluent in Hindi... At the same time, having ~2 hands and ~10 fingers seems about right. Some metrics work with averages, some don't.
First of all, finite. There is a minimum and maximum. Second, questions tend to be internally correlated. (After all, they correspond to subjects.)
Third, students are not expected to be average but pass all the questions.
The implementation varied between classes - in my World History class, there were a large number of objectives, and each objective was met by a small quiz that tested ~one skill. (There were a lot of retaken quizzes in that class.) In Biology, there were about 10 objectives for the entire semester, so you could still pass while missing a few small skills, as long as those missing skills were spread out among different units.
My high school used that "objectives" system less and less as I moved up the grades -I assume that most teachers got tired of it pretty quickly and just decided to make their usual teaching material "look like objectives" rather than rebuild their curriculum in later years.
Jobs that actually need a strong foundation in CS theory are very rare, and will continue to be and the fantasy that you need a computer scientist to manage your CRUD app is resulting in many people incredibly over qualified for their positions and, in my opinion, one of the major reasons there's so much mental illness in the technology space.
Both focus on different goals and clearly they are not aligned and they shouldn't be either. I would be in favor of just getting the fundamentals to enter the workforce, get my feet wet, get a sense of how my interests match up with the market and then pursue focussed education in areas of interest.
This will require a lot of support from the academic institutions as well as progressive employers. This provides more arenas for longer and more meaningful relationships that are flexible, less rigid and can move faster to meet market needs.
A CS degree with at least 2 summer internships building real software ticks both boxes.
A Computer Science degree does not, and should not, be the sole qualifier for whether or not you want to be a programmer.
Many strong graduates wind up in roles at major companies - Google, Facebook, Amazon, Microsoft, etc - where they are working with teams to implement things that do require research, rigor, etc. Their value as a contributor is wrapped up in theory, the code is just an implementation detail.
Bootcampers, meanwhile, often find themselves at younger companies that are more focused on shipping features and stamping out bugs - areas where the ability to write and ship code quickly is a priority. The differences between a b-tree and a red-black tree will be moot to them unless they're interviewing; going beyond binary search, hashmap, and bloom filter sees diminishing returns on investment in the near term for most small companies.
This should be done in tandem with theory.
Questions are scored alpha for a completely correct solution, beta if the examinee demonstrated that they knew what they were doing by maybe made some small mistake, and gamma for a reasonable effort.
The bare minimum pass mark is one alpha.
It becomes a little like companies saying they value x & y, but take action only aligned to z.
Explicitly the aim is to eliminate students who haven't deeply understood some aspect of the curriculum, so accumulating lots of partial results is exactly what they don't want.
It's worth noting that out-right failure is extremely rare and subject to an appeals process etc. Partly this is because this is a set of exams at the end of each year of instruction with no mechanism for a re-sit, so a student who fails will not graduate (the system isn't totally barbaric, there are mechanisms in place to handle health related concerns etc.).
I looked up details, they can be found at https://www.maths.cam.ac.uk/undergrad/course/schedules.pdf
My memory is a little off. There is no gamma. And the pass is (in the first year) is 2alpha * beta
---
I suspect that the distribution of the curve has to depend on: subjectiviness of the test and on the grading. Tests with questions where you know it or you don't. And how much partial credit graders are willing to give.