Tuesday, February 16, 2016

A nuanced look at Student Data Privacy

Student data privacy is a big concern with the proliferation of data in education. But is all data equally concerning? We tend to put all “Student Data” in one big bucket and treat it as if it’s all the same. Yet we know there are drastic differences between different schools and EdTech companies in their mission, approach, pedagogy and tools: is it really reasonable to suspect that all of their data is the same? And more importantly is all data and uses of that data equally threatening from a student privacy standpoint?

I’ll give away the punchline- my answer is no, not all student data is equal. But if it’s not all equal, what criteria can we use to distinguish student data?  Here’s some ideas.

1. Product Improvement and Data


Let’s start with a funny little clause that pervades many privacy policies of EdTech companies: “Student data can be used for product improvement.” I added the italics, because that last bit is the important part- what does product improvement really mean? Well, it depends on the product, and on the company that produces the product. Let’s make a broad distinction between two kinds of companies- content providers, and content distributor. 

A content provider produces educational content, whether that content is lessons, movies, interactives, games, or activities. Their value come from the value of their content- their product in fact is the content itself. A content distributor is a company that finds and distributes content to learners. Their value comes from finding quality content from a pool of existing content- their product is the engine or algorithm that profiles users and finds valuable content.

Now let’s go back to that phrase “product improvement.” For a content provider, it’s quite clear what this phrase means- it means making their content better. It means finding out where your games and activities fail learners, and making that content a better, more productive learning experience. For a content distributor, the situation is a little different. Product improvement means making a better algorithm, that more readily understands users and makes more apt recommendations to them. A content distributor can’t affect the content itself- they didn’t produce the content, they just distribute it. Their improvement can only come from using data to create more complete profiles of students that serves better recommendations.

Another way of putting this- content providers use data to better understand how their content works (inward-focused on the product), whereas content distributors use data to better understand how their users work (outward focused, on the student). These are very different uses of data with radically different potential harms and benefits. 

Of course it may actually be quite rare that a company falls solely into one camp or the other. Often content providers make so much content that profiling users to make recommendations amongst their own content becomes useful and necessary. And content distributors often have a good deal of control over their content, and can choose what content to include or not include in their algorithm. Yet even when these uses are mixed within one company, it seems useful to put data-for-content-improvement procedures in a different set of guidelines than data-for-better-profiling procedures. Here’s why:

Using data for content improvement carries little to no risk for the student. The student isn’t the direct subject of the research- the educational content is the direct “research subject.” On the other hand, there is a significant potential for an indirect benefit to the student, because their data can lead to content improvement, which improves their and other students’ future learning on the product. 

Using data for profiling students carries greater potential benefit, but also a greater direct risk to the students. The benefit is that we know learning is context-dependent: everyone does not learn best the same way, and the potential to personalize learning to a learner’s specific needs can greatly improve the learning process. On the other hand, the student is now the direct “research subject” - a profile is being built for each student, and that profile could be used to stereotype or stigmatize students as much as it can be used to help students. How harmful this can be depends on the info and algorithm used to make the profiles, as well as the intended use of the profile. How these profiles are used and acted upon is a really important issue in ethical use of data and social justice, and certainly deserves concern and attention.

2. The Robot Tutor and Data


Ok, now let’s jump to another idea about data use: the idea of the “robot tutor” that replaces a teacher. A successful robot tutor is of course highly fueled by data, from which the converse is often assumed- using data means you are trying to build a robot tutor. But of course, there are many uses for student data, and a robot tutor is only one of them.

You actually have to undergo a two step process to get from data to the robot tutor. Step one- should we use data to understand student learning? To this I think almost everyone agrees that the answer is yes. We have been using data to understand student learning for long before digital technology came around- that why we assigned points to student work, marked grades, and tallied grades in a gradebook. In fact, I would almost say that if you aren’t using data to understand student learning, you aren’t doing everything you can to improve a student’s educational experience.

So now let’s assume that we definitely want to be using data to understand student learning. Step 2- who should act upon that understanding of student learning, a teacher or a machine? This is the crucial step. For an EdTech company, the former answer leads to development of teacher dashboards, the latter leads to the development of a robot tutor. I don’t think it should be assumed that data equals robot tutor- for many companies, data simply equals teacher dashboard: not a replacement for a teacher, but a tool that helps a teacher make more informed decisions.

For student data privacy, distinguishing the use of data for dashboards vs. robot tutors is important. In the former case, the data is used by the teacher, a person we commonly accept to make educational decisions about students and is hardly an objectionable user of data. In the latter case, the data is used by a company to make educational decisions about a student. Although this latter case is not always bad, it certainly falls into a more objectionable ethical category that should be subject to more scrutiny than the former.

Adaptive Learning and Data


One point underlying the previous two sections is that data and adaptive learning are thought to be synonymous. Anyone who is collecting data must be surely trying to make an adaptive learning algorithm! But as the stories above illustrate, that is far from true. There are many uses for data, several of which have little to do with adaptive learning.

Data in education doesn’t have to mean adaptive learning. And this is important, because the most sensitive and objectionable uses of student data typically occur in adaptive learning. Adaptive learning often tries to create the most complete profiles of students, and leads towards a robot tutor kind of solution to education. 

When all student data is dumped into one bucket, we have to design standards and guidelines for the most objectionable scenarios contained in that one bucket. Data can (and is) being used by many companies to directly improve products and to provide actionable dashboards back to teachers. I argued above that this category has minimal risk to students, and it seems wrong to put these uses in the same bucket as adaptive learning.

This isn’t meant to be an argument for or against adaptive learning and how it uses data- that would certainly have to wait for another day. This is really just a call to be more nuanced and thoughtful in how we consider data in education. All data and all uses of data are not equal- student data deserves multiple buckets.

No comments: