Digesting Data on Higher Ed - Slowing it Down

9/16/2015

-Brittany

With all of the reporting around College Scoreboard this week, it struck me how difficult it is for a non-academic consumer of information (like myself) to find out exactly what a large set of data like this means. So I stopped skimming all the headlines and took a closer look... for what it's worth.

Mixed Messages
ProPublica's report "Colleges Flush with Cash Saddle Poorest Students with Debt," focused on how selective institutions drive up debt for low-income students: "More than a quarter of the nation’s 60 wealthiest universities leave their low-income students owing an average of more than $20,000 in federal loans."

This seems to support the findings from the Hoxby-Avery study from 2012 (see table 1: College Costs and Resources by Selectivity). But what's striking is that the way the messages of the two sources are presented, you might come to two completely different conclusions about the data.

ProPublica = selective institutions are expensive for low-income high-achievers
Hoxby-Avery = selective institutions would cost less than non-selective schools for low-income high-achievers

Technical Barriers
I tried to look at the data myself. It wasn't easy. I went to College Scoreboard, ED's public-facing website and downloaded the raw data. I immediately realized:

I need to use STATA to see anything useful
I need to understand the coding enough to set up my parameters in STATA
I actually need to have STATA (usually that means being affiliated with an organization or school that has a license)
I need to know what STATA is. OK, I do, but still.

Comprehensiveness of Data?
And to add to the confusion, eLiterate reports that there are problems with the actual information gathered, including a big chunk of degree-granting community colleges simply overlooked.

On the numbers around completion rates, ED acknowledges that their data doesn't reflect the reality of adult education. Again, from e-Literate, a quote straight out of Dept. of Ed's report on College Scoreboard:

"The most commonly referenced completion rates are those reported to IPEDS and are included on the College Scorecard... However, they rely on a school’s population of full-time students who are enrolled in college for the first-time. This is increasingly divergent from the profile of the typical college student, particularly at many two-year institutions and some four-year schools."

What to think?
Even without considering the effects of well-documented psychological phenomena like confirmation bias and priming, it seems like a herculean challenge to get a solid grasp on this stuff.

Any strategies to share?

2 Comments

Mimi

9/21/2015 09:45:44 am

Brittany-
I read your blog post last week just after posting my response to the admissions readings, titled "The Problems with Data".

Some quick excerpts here:
"I believe in data, and I think good data can do wonders for policymaking. But spending too much time with data invokes the realization of how manipulative it can be."
and
"Even randomized experiments’ biggest fan, Josh Angrist (MIT), concluded his 2004 paper “American Education Changes Tack”, a staunch support document of hard data with a cautionary note against poor uses of data, “Trojan horses of conflicts of interest”, and the ramifications of putting too high on a pedestal conclusions with a “veneer of science”. The status that numbers seem to have in education, from kindergarten through college, is problematic."

In essence, my takeaway on "data problems" is three-fold: (1) the quality of the data and our misuse of external validity in education research, (2) the misuse of findings to inform policy, and (3), a larger problem, encompassing test scores and GPAs in both higher ed admissions and as outcomes in education research.

A couple days ago, Amy Laitinen wrote "Why the U.S. Needs Better Student Data" for the Chronicle (http://chronicle.com/article/Why-the-US-Needs-Better/233253/?cid=at&utm_source=at&utm_medium=en special thanks to Robert for the link!). She discusses that these problems that we're finding with bad data are not accidental (mostly stemming from a 2008 law prohibiting student information databases). Her article is pessimistic about the present yet hopeful about the possible future of this new DOE data.

While I appreciated Laitinen's commentary, another recent article, "New Data Gives Clearer Picture of Student Debt" (http://www.nytimes.com/2015/09/11/upshot/new-data-gives-clearer-picture-of-student-debt.html?hp&action=click&pgtype=Homepage&module=mini-moth®ion=top-stories-below&WT.nav=top-stories-below&_r=0) gives a much more thorough and pointed critique of higher ed data. This is the kind of writing and analysis we need. Dynarski hones in on her subject (student debt) and identifies a key problem with prior data (for-profit and CCs not included in summary statistics).

I guess I don't have an answer for you, really, other than "I know how to use Stata". But we could talk in circles about problems with data. Not that we shouldn't identify them! But at a certain point, do we ever move on? To me, the startling thing about the College Scorecard was the data it emphasized, and how it compares in intended audience and content to, say, Fiske's Guide. Over the next decade or so, the implications of the Scorecard will be seen, probably with flawed data, but seen nonetheless.

Until we have "good" data (unsure of what this even looks like), the way to move forward, for me, is to identify the data's flaws vigilantly, and try to work with them rather than around them.

Brittany

9/21/2015 12:30:28 pm

@Mimi: Thanks so much for your response and the article links! I realized of course after I posted this that I had been looking at the wrong part of the College Scorecard website and that their comparison tool is actually quite useful. So it might be that I would never need or want to analyze the data myself in a program like STATA.

Even still, it is the larger issue that bothers me of how we responsibly consume scientific information. I think your point is good; that we need to keep the flaws in a data set in the forefront of any interpretation. But often, we are relying on others to find the flaws and report them. Not all media outlets will have the same agenda there. I think I just struggle as a consumer of information who doesn't have the time to do a deep dive into the background of all issues that I find important. It can be difficult to know what sources to trust.

...Slightly different topic than what I was first contemplating, but something I think about a lot.

Digesting Data on Higher Ed - Slowing it Down

Leave a Reply.

Steven Volk

Archives

Categories