Inside the modern-day police crime lab

May 13, 2021

 

 

Guests

Dr. Cynthia Rudin, Mathematics Professor at Duke University

Giovanni Gaccione, Director of Strategic Planning & Portfolio Management at Genetec

 

Description

What is a black box? How is data interpretation helping law enforcement? In Engage Episode 7, we look at how interpretable models are helping Law enforcement discover new patterns to solve crimes. Two thought leaders join the conversation, Mathematics Professor at Duke University, Dr. Cynthia Rudin, who’s work with several major US police departments has helped define best practices in interpretable machine learning, and Giovanni Gaccione, director for strategic planning and portfolio management at Genetec.

 

Transcript

DAVID CHAUVIN (HOST): Welcome to engage a Genetec podcast. 

 "Okay, Jed, what's coming? Double homicide, one male, one female. Killer's male, white, 40. Set up a perimeter and tell them we're in route. I'm placing you under arrest for the future murder of Sarah Marks. Give the man his head. The future can be seen." - Minority Report. 

DAVID CHAUVIN: But can it? When the subject of next-generation crime-solving technology comes up, many think of this film, the 90s sci-fi classic Minority Report, and its frightening portrayal of the future of policing. 

KELLY LAWETZ (HOST): Fortunately, it bears zero resemblance to modern-day crime labs and the practice of data-driven policing. Still, when the subject of predictive policing comes up, many run for the hills without understanding what it's all about. I'm Kelly Lawetz. 

DAVID CHAUVIN: And I'm David Chauvin.  

DAVID CHAUVIN: While science fiction can help us predict the dangers of a world where technology is left to trample over civil liberties. 

KELLY LAWETZ: Actual science, in this case, data science in the right hands and developed with proper sensitivities, can help guide law enforcement toward the correct conclusions and even the right predictions. 

"Bad bail and parole decisions are being made because people typed the wrong number into a black-box model. It's letting dangerous people go free, and it's keeping people in prison who don't deserve to be there." - Cynthia Rudin. 

KELLY LAWETZ: That's my guest, Dr. Cynthia Rudin, a mathematics professor at Duke University, whose mile-long CV has brought her to the Cambridge police to solve a series of break-ins. Notably absent from their datasets were demographics, socioeconomic status, or race, data points generally associated with questionable profiling practices. I spoke to Dr. Rudin in the first half of the show. 

DAVID CHAUVIN: Back-to-back with my interview with former Genetec Law Enforcement Practice lead Giovanni Gaccione. 

Interview with Cynthia Rudin

KELLY LAWETZ: But first, we set the stage with Dr. Rudin, who brings us right into the beating heart of the modern-day crime lab. Her work is at the cutting edge of Interpretable Machine Learning, designing complex models understandable to humans. This discipline finds sharp contrast and is a reaction to what is known as black-box models. I began by asking her to help us understand that difference. 

CYNTHIA RUDIN: A black box is a predictive model that no human can understand, either because it's too complex or proprietary. This means it's some company's secret sauce, and they don't want anyone to know how the predictions are being made. The problem is black-box models. They're great for certain things, but if you're doing anything slightly high stakes, then the black box generally needs to go. We want to have models that humans can understand, particularly in health care or criminal justice. In criminal justice, people were subjected to black-box models that made wrong predictions. Either because of typographical errors in the data input or improper design, these predictions caused people harm. We want to try to develop models that people can understand so that doesn't happen. For instance, there are many reasons why you don't want a black-box model in health care. You want the doctor to oversee their own patient. You don't want the doctor to not understand what the model does but trust it because they don't know what they're doing. Instead, you want the doctor to do the type of systems-level thinking that humans are good at and recognize there's information about this patient that's not in the database or the predictive model. Doctors need to take that into account when they're figuring out how to treat patients. They can't just go with the model's prediction. Predictive models are great for other things than humans. Predictive models are great for calculating probabilities from large databases. That's something that humans can't do in their heads. But you kind of need them to work together. Humans and machines work together to be able to create those predictions. 

KELLY LAWETZ: You've worked with organizations, governments, and police. What's it like working with law enforcement and law enforcement agencies? Because from what I've heard, they're data rich and information poor. 

CYNTHIA RUDIN: Well, let me tell you about the problem I worked on because it's different from what AI has been used for lately. The problem that I worked on was fascinating. When I was a professor at MIT, I got an email from one of the centers I was associated with, saying the Cambridge Police Department has a problem. They told me if you think you might want to help, come to the meeting. I show up at the meeting, and there were many senior professors and police officers. It was very intimidating. They were all standing, and we were all sitting. It was exciting. A detective, Dan Wagner, stood up and said, we have this problem; we want to solve housebreaks in Cambridge. We have a crime series occurring, there are groups of people who are committing break-ins, and we want to link them together. We want to know which ones are connected to each other to do something about it. Does anyone want to help? They went around the room after describing the problem, and everybody was willing to supervise but not actually do it. When they got to me, I said I'd do it, and I was really excited about it. It turned out to be a fantastic project. A brilliant data analyst at the police department noticed several pickpocket crimes at the same cafe. It was like a needle in a haystack because the cafe was in Harvard Square, and there were just crimes all over, which made it tough to pick up this Tuesday, Thursday pattern. Once the analyst figured it out, they sent a police officer to get their pocket picked and made the arrest. The officer found this unique pattern, and they were wondering if it could be automated. They decided to do it with housebreak. A housebreak is when someone breaks into a house, steals things, and usually leaves without being detected. There's usually no description of what the person looks like, and you generally don't know when they came. So, they created a database over ten years. Each break-in was recorded with all the available information. Was it an apartment, or was it a single-family house? Was it a multifamily house? How did the person break in? What did they steal? Was it a weekday? Was it a weekend? Where was it? Then we took that data and tried to figure out the modus operandi of the crimes while simultaneously figuring out which crimes were related. It was a challenging clustering problem because we didn't know what factors part of the modus operandi until we knew which crimes were related. But we also didn't know which crimes were in a set until we figured out the modus operandi. So, it was a subspace clustering problem that was analytically difficult. It was exciting and a really cool project. At some point, we realized that we weren't a company, and all we could do was code. We couldn't build it into a product or anything. So, we released the code, and a brilliant young data analyst joined. He was in the NYPD Data Science Group, and he decided to implement this in the NYPD. He would email us from time to time to ask us questions about the code, and he built it into the New York Police Department's whole system. They have been using it for several years to find crime series, which I think is cool. 

KELLY LAWETZ: So, one question I have always comes up when we're talking about machine learning and artificial intelligence. So how does interpretable machine learning eliminate discrimination? 

CYNTHIA RUDIN: It's much easier to detect if a model is biased or depends on the wrong information when it's interpretable. 

KELLY LAWETZ: Can you give me an example? 

CYNTHIA RUDIN: Yeah, the example that I really like is the Compass scandal; that example just blows my mind. In 2016, ProPublica wrote this article that sort of set off the entire algorithmic bias thing. Apparently, a model used in the criminal justice system depended on race and was biased against black people. It kind of put the blame squarely on the algorithm. At the time, I had just written a paper on interpretable models for prediction in the criminal justice system. We had had a section on racial bias in our report before that article came out. Our paper basically said you don't need a black box because we can design an interpretable model. We don't think the model needs to depend on race because you're not getting any predictive accuracy from putting race in the model. But you must be careful because race is correlated with age and criminal history, and that's due to systemic racism and bias in society. From what I knew, I thought it would be bizarre if ProPublica was right because they said there's a model, it's a black box, and it's biased against black people. It uses race in addition to age and criminal history. I thought to myself, firstly, why are we using black-box models in the justice system because you can't troubleshoot them. Secondly, why do these models depend on race instead of relying on age and criminal history because race doesn't give you any extra predictive juice? I thought, okay, I'll have to go and figure out what happened here because everyone's getting really upset about this, and I'm not sure it's for the right reasons. They should be upset not just about the racial bias but that this is a black box, and we don't know what's in it. It's tough to work with black-box models because you don't know what's in them, and that's the whole point. We took the data from Florida, where ProPublica was investigating. We created some careful plots and tried to figure out what happened in their analysis. It turns out ProPublica made a crucial assumption that if you approximate the compass scores, approximate the black box with a linear model in terms of age, criminal history, and race. Whatever essential variables are in the approximation would also be important in the black box. They found that if you approximate compass files in the model, it depends on race; therefore, Compass depends on race. That assumption is problematic because we don't think the Compass is linear and variable. So, when you make this kind of approximation and then say the essential variables in the approximation are variables in the model, that's not valid. When we considered the proper non-linearity concerning age, then all the results from the ProPublica article went away, and it seems that Compass only depended on age and criminal history. I understand what ProPublica was trying to do. They were trying to raise awareness of the fact that these models do depend on race. But I think that the article should have said I can't believe black-box models are being used in the justice system, they're a problem, and you can't figure out how they depend on race. So here, they probably rely on race through age and criminal history. We need to think for ourselves whether that's okay or not. 

KELLY LAWETZ: You know, there's a lot of benefit in health care, in business, in criminal justice, for using machine learning to help these organizations make better decisions. Often what I find is it's the communication. It's the lack of communication regarding what that agency is doing, how they're doing it, how they're getting the results, and how they're using it to make decisions. I'm wondering, from your perspective, how would you improve the way organizations communicate the value of that to the public.   

CYNTHIA RUDIN: With some of the risk scoring models. The Arnold Foundation created its own risk scoring model and published it. It's just a straightforward scoring system, the kind that's been used for one hundred years. They've gotten a lot less pushback on that model than that, something like Compass, where it's where we have no idea how to make its calculations. If people could look at the model, they could decide whether this is a good idea. Right. We need to decide, okay, here's how the model depends on age, do you agree with that? If you disagree with that, that's different than if you can't tell how it depends on age. 

KELLY LAWETZ: Have you engaged in those conversations with the public and the police? Have you participated in anything where the model is public? And then, is it people from all walks of life, not just statisticians, not just the organization, but the public who will be at the receiving end of those decisions? 

CYNTHIA RUDIN: I've done some amount of engagement. Lately, it's been this massive argument within the mathematics community. Mainly, thirteen hundred mathematicians signed this petition to say we shouldn't be using any statistics in the justice system or in the courtroom. And I disagree with that. The problem is that low-risk people are being put behind bars too often. If you have these risk scores, they help reduce that number. It is so much easier to de-bias an algorithm than it is to de-bias a judge. As much as I sympathize with people wanting to help reduce bias and make our society better, I don't think that telling mathematicians not to work with police departments is the way to do it. Ingrid Daubechies and I wrote a response saying, no, we really should convene a group of people to work together to figure out the best way to do this. We don't want to just omit numbers of statistics from the courtroom. We want to use both people and algorithms to their best abilities. Ideally, we want a centaur, a human-machine team that works together better than either one alone. You can have computers that can aggregate data sets and create probabilities. In contrast, humans can adjust the scores for what we know about the person standing in front of us, instead of what's going on now where the judge just looks at the person can see their race and makes a judgment. The judge is a black box. That's the kind of engagement I've had most recently, just trying to do damage control on this idea that mathematicians should never talk to police, which I really disagree with. What happens when mathematicians say, we're not going to speak to police and companies want to sell police a flawed facial recognition program. If there's no engagement, then there's no one to tell the police; they shouldn't buy it. So, yes, that's the level we're at right now. 

KELLY LAWETZ:  Cynthia Rudin, thank you so much for sharing your time and insights with us today. That's Dr. Cynthia Rudin, a mathematics professor at Duke University, staunch black box critic, and a leader in interpretable machine learning. 

Interview with Giovanni Gaccione

"Chicago Police Department is the second-largest police department in the United States. There are twenty-two police districts in the city. We have about twelve thousand six hundred sworn, then there's a headquarters facility, and there are other specialized units. We were experiencing a record high number of shootings and murders in 2016. It was a tough year, and the mayor and the superintendent knew that we had many different information sources and many different platforms that users had to access those sources. And we didn't have a single place where everything came together." 

"The team that was here talking to Chicago saw this as an expansion of the Genetec product line, that this could be really something beyond what we currently do. The idea now is can we bring all that information together? And that's really where the idea of Citygraf started." - Giovanni Gaccione. 

DAVID CHAUVIN: That's Giovanni Gaccione, director of strategic planning and portfolio management at Genetec. I started by jumping right into the technology to better understand if it was simply designed to make cracking cases easier or do things they couldn't do before. 

GIOVANNI GACCIONE: From a Genetec perspective, we were cautious about all the trendy words like machine learning and AI. Citigraf and Valkyrie fit in by using data and mining that data to allow humans to make their intelligent decisions on that data. So, it's about using machines to do the heavy lifting. You accentuate the intelligence and the decision-making that the human adds to it. It's really the marrying of both of those things. 

DAVID CHAUVIN: Talk about data points. What kind of data are we talking about here? 

GIOVANNI GACCIONE: With cities, the most exciting thing, and even law enforcement, is that you can have all sorts of different data points. This date is crime data, 311 call data, 911 calls, different types of records, record management systems, jail management systems, anything, and everything. So, the whole concept here is trying to allow investigators to connect those dots quicker by leveraging these platforms to search the data sets. 

DAVID CHAUVIN: So, to be clear, you're not getting data from sources that wouldn't usually be accessible to law enforcement. Obviously, privacy is a huge concern, and intrusive police forces and law enforcement agencies are overstepping the boundaries with Valkyrie, Citygraf, or tools from other suppliers. There are quite a few out there. Is the guiding principle then that you only touch the data that would be there regardless of whether your tool was present or not? 

GIOVANNI GACCIONE: Yeah, that's correct. We don't create any new artifacts in the city, and all the data we touch is city-owned or law enforcement-owned data. We're not going out and scavenging social media platforms. We're not hitting sources that law enforcement or government agencies don't already have access to. What we do is once they have that access, it's about cataloging it, searching it, and finding those connections within the data set. 

DAVID CHAUVIN: In a world like law enforcement, traditionally, it's a lot of boots on the ground, and manual labor and investigations are all about talking to people and interviewing suspects and witnesses, and people of interest. What's the adoption of agencies saying, oh wow, the computer can do a big part of the job, or the computer can do the heavy lifting? Is it reasonably well-received, or is there a lot of resistance? 

GIOVANNI GACCIONE: It's a double-edged sword because we have cities and city agencies looking for technology to help them in their everyday job. But they're also getting bombarded by a lot of buzzwords. What we're trying to do is we're trying to get police from where they are today to just one step above. We're getting them connected to their data source, breaking down silos, and we're trying to get them to where they need to be in 5, 10 years. But we want to take that first step. So, what we're trying to do is we're trying to take manual steps and just automate them so that it's going through the records for them.   

DAVID CHAUVIN: Now, one subject I want to talk about is the implicit bias issue. It's becoming more mainstream and a more significant part of the conversation, but implicit bias having an impact on law enforcement, on the entire justice system, and all the way to judges and sentencing. Do you think that tools like Citigraf or Valkyrie, or others out there can help agencies remove some of the implicit bias that would typically occur during an investigation? Or is there a risk that those tools could further that bias by helping detectives reach their conclusion faster than if all the work was manual? 

GIOVANNI GACCIONE: We don't have prediction algorithms. We don't have any of those algorithms. So, we're not providing our own bias on data. It might be already biased data. Again, it goes back to giving utilitarian tools to automate that extensive data search. The second part of the conversation is how we can help eliminate that bias? The considerable value the Valkyrie and Citygraf bring is because you can test out your hypothesis quicker, which means that you, as an investigator, aren't invested in that hypothesis. We would like to believe that by testing those hypotheses within hours or maybe days, contrary to weeks, you might be able to freely toss a theory and try a better one. 

DAVID CHAUVIN: What percentage would you say of enforcement agencies in North America have adopted some sort of technology to help connect those dots? 

GIOVANNI GACCIONE: If you look at all technology as like a seven-year curve, from early adopters all the way to mature products, I'd say we're probably around the year three mark. Everyone understands now what the technology is; it's not new to them when you bring it to them, but rather how they implement that technology is new. 

DAVID CHAUVIN: Are there many applications outside of law enforcement, or are these technologies targeted law enforcement? 

GIOVANNI GACCIONE: The way it's being built today is that it's a tool to optimize operations. We decided that we're going to pick one vertical to go at right now. But the way the infrastructure was built out is how do we, A. Make operations more efficient, and B. How do we help people investigate data? It just so happens to be currently crime data, but any type of mass data that you're looking for, those connections will still function in this. So, to answer your question, today, we're entirely focused on crime, cities, government. Still, we did not build this to just be in this vertical. 

DAVID CHAUVIN: As a private citizen, are you concerned at all with what's capable now and in these types of technologies, connecting the dots, reading your license plate? 

GIOVANNI GACCIONE: Well, I'm happy many different media and associations are looking into it. Everyone needs to know what's going on and how decisions are made, and it's a healthy dialog that's happening. I do hope that it continues with some sort of oversight into the future. I think that'll be good. My concern is if it goes the other way and there is no oversight or discussion. If we just allow different agencies to dictate what happens, I think that's the wrong thing to do. But with all these new conversations, you have various associations checking up and asking what exactly you are using for algorithms? We saw an enormous pushback on facial recognition usage. It shows the power of citizens to really force the policy to go in the direction they want to have happened. We're about four years out from when that occurred. We could have been in a very different position today with face recognition across almost every IoT device where that's not the case. You're having discussions in every city about which technology uses it. Can you use that technology? If government and policy owners dictate that you can or can't use it, or you must use it in this way, technology companies will adjust. They'll adapt, and they'll make sure that they're in those directions. But I also think it's vital that we have city agencies and different governments make sure they know where their technology is coming from. It's not just good enough to say at the surface, we bought this technology from X, Y, Z; you really need to understand the algorithm they're using. Where did it come from? Is this code being developed in different parts of the world? It's also the customer's responsibility to understand where their products come from. So, we can only do so much from our own safeguards as a company to protect what we sell. That won't stop a customer from going outside of the borders to purchase something somewhere else. So, to me, it's a relationship. We both must make sure we understand what we're doing on both sides of the equation from a manufacturer and a customer. 

DAVID CHAUVIN: Considering the current climate of distrust between a significant portion of the population and government agencies in general outside of law enforcement, how can governments educate people? How can they make sure they trust what the technology can do and how they're using it? 

GIOVANNI GACCIONE: Trust is lost in a second. Where that trust gets lost is an agency, you can't just pick and choose which things you're going to talk about and be transparent about and not what you're going to not be transparent about. What we've seen is that agencies try to be very transparent about specific policies. Still, then news comes out about some other thing they were doing that breaks that trust, and it's this constant cycle. From a trust perspective, it needs to be just like everything else. It needs to be a holistic process. 

DAVID CHAUVIN: Do you think that there's a responsibility from the manufacturers or from the security industry to work closely with civil liberties, like the ACLU, or organizations that defend civil rights and privacy? 

GIOVANNI GACCIONE: I've always been an advocate to get civil liberty groups and associations involved earlier in the process. There's nothing better than going out to these groups during early development or the thought process of these technologies and getting feedback. They are a stakeholder in this as much as everybody else. If you involve them sooner, you can build into those checks early on in your product and help you along the way. They are not just, we made this technology, what do you think? It should be; we're thinking about doing this; can we get some input on it? 

DAVID CHAUVIN: What's the next big frontier of these correlation engines and data capture systems for law enforcement? What's the next big thing? 

GIOVANNI GACCIONE: Once we get these different data sets in, it's really working with all the cities on what they can do with that. Most of the cities we're talking to already have lots of data scientists, and they already know what they want to try. Right now, it's proving out that what they want to try is successful. So, we're in this proving phase of trying to make sure this technology can actually work. Once that gets proven out, you can do other things. Where we hope this goes is that you can have cross-department interactions and what's across department interaction. Let's imagine a car driving on the highway, activating in Waze or Google, that an object's on the road or a car's pulled over. Well, that's such rich data about where that car is, what's information about it. Not only can that information go to the traffic department, but it'd be incredible if the traffic department. When they're triaging that event, they're looking at the camera, and they see maybe it's two vehicles. They can then push that to the police. So, you have that information that can then flow easily from traffic to police. In contrast, today, we're talking about siloed data within the department. The next step to me is breaking down silos between departments so that traffic can share an incident with the police, or the police can share an incident with fire. When the information crosses these boundaries, because we understand the data, we can ensure the correct privacy is filtered out so that we're not sharing crime data with the fire department. The traffic team is not sharing personal information about either the tow truck or the tow company that went out there with the police department. So, the next most significant phase is going to be how successfully we can transfer these pieces of information with the proper privacy filters, let's call it. 

DAVID CHAUVIN: Are you generally optimistic about the change that could bring? 

GIOVANNI GACCIONE: Well, I wouldn't have brought it up if I thought it was impossible. But the transformations that are happening in cities are so foundational that I do think it'll change. You now have Chief Information Officers (CIO) or Chief Data Officers (CDOs). You have all these new C-level positions paving the way and setting a vision for where they will be with this data. It will help them say these enhancements we can give back to citizens by marrying this data or allowing this data to really work together. At the end of the day, cities provide services to citizens. Once you figure out what benefits you can give back using this treasure trove we call data, the possibilities are endless. The majority will happen within five to seven years of having a game plan and starting to execute on that game plan of how to do this. You're going to see cities already ahead of the curve making tremendous wins on things they're already doing. The work that cities and law enforcement agencies are putting in is going is breathtaking. The things they want to provide their citizens the technology and how good they're trying to be. How transparent they're trying to be, how they're making it a priority, and how they're including groups like the ACLU earlier, they're doing all the right things. I'm super excited, but just like every relationship, it just takes time. You can't go from where we are today to next week. Everyone believes in it. This will take time. But I see the steps they're taking, where they're putting their money and training their recruits; all of that is in the right direction. So, it's going to be really cool in the future. 

DAVID CHAUVIN: Fantastic. Thank you so much for your time, Giovanni. 

GIOVANNI GACCIONE: Thank you.   

DAVID CHAUVIN: That's  Giovanni Gaccione, now director of strategic planning and portfolio management at Genetec. We hope you've enjoyed this episode of Engage on the challenges and opportunities of modern-day data-driven policing. I'm David Chauvin. See you next time on Engage. 

 

Engage, a Genetec podcast, is produced by Bren Tully Walsh, the associate producer is Angele Paquette. Sound Design is provided by Vladislav Pronin. Our production coordinator is Andrew Richa. The show's executive producer is Tracey Ades. Engage a Genetec podcast is a production of Genetec Inc. The views expressed by the guests are not necessarily those of Genetec, its partners, or customers. For more episodes, visit our website at www.genetec.com on your favorite podcasting app or ask your smart speaker to play Engage a Genetec podcast.