Finding Clusters of NYPD Officers In CCRB Complaint Data
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624505769/Screen_Shot_2021-06-21_at_8.58.50_PM.jpg)
Why?
Complaints filed against police officers by the public are often the first and only warning sign that a cop might be on a course of escalating violence.
In the deaths of George Floyd and Eric Garner their killers had a documented history of complaints filed against them. Unfortunately nothing was done to disrupt their pattern of abuse, and both cases ended in those officers killing members of the public they had sworn to protect.
Table of contents
George Floyd
Chauvin, who was fired, has said through his attorney that his handling of Floyd’s arrest was a reasonable use of authorized force. But he was the subject of at least 22 complaints or internal investigations during his more than 19 years at the department, only one of which resulted in discipline. These new interviews show not only that he may have used excessive force in the past, but that he had used startlingly similar techniques. 'That could have been me': The people Derek Chauvin choked before George Floyd
The officer convicted of murdering George Floyd had at least 22 complaints against him. The officer who put Eric Garner in a chokehold and killed him had 7 complaints filed against him.
Eric Garner
Before he put Garner in the chokehold, the records show, he had seven disciplinary complaints and 14 individual allegations lodged against him. Four of those allegations were substantiated by an independent review board. The disturbing secret history of the NYPD officer who killed Eric Garner
Of the 14 individual allegations against Garner's killer, 5 are for force: "hit against inanimate object", "physical force", and a single complaint in 2014 that would foreshadow the behavior that would eventually end the Officer's career: "Force - Chokehold".
I am documenting my analysis in detail for a few reasons:
- So that other people who may want to perform similar analysis for other Police Departments can understand and recreate my analysis
- So that every step is documented, and any mistake can be easily caught and fixed by the infinite supply of people on the internet who are smarter than me
- To maybe inspire people to use computers to investigate the things in the world that are important to them, and share the tools I use to do that
Network visualization prior work / inspiration
You may have seen network analysis like this before.
Jacob Silver used it to look at the spread of anti-vax material.
Adi Cohen has pioneered a method of combining Gephi with CrowdTangle to analyze the network of groups and pages sharing links.
Provenance
WNYC/Gothamist received the data in response to a Freedom of Information Law request and provided me an excel file for analysis.
The Dataset
Differences from data previously released by ProPublica
ProPublica released and covered similar data in July of 2020.
They chose to only publish data for "active-duty officers who’ve had at least one allegation against them substantiated by the CCRB".
The dataset we are working with today contains every complaint and officer, even those with no substantiated allegations.
It also contains officers who were listed as witnesses on complaints, including complaints found as "unsubstantiated" or "unfounded" by the CCRB.
This makes it a "noisier" dataset. In our case this can be an advantage since we are looking to visualize the network of officers.
Being named with another officer on a complaint, even if that complaint is unfounded, is a signal that those officers interacted in a way that was noticed by the public. Being the subject of an unfounded complaint together might even cause officers to form a tighter relationship. Because of that, I will incorporate witness data into our analysis.
NYPD internal structure as it relates to our data
The NYPD is divided into coverage areas within the 5 boroughs known as precincts. When I lived in Brooklyn, I lived in the 81st Precinct which covers Bed-Stuy.
The NYPD also has a number of units, like the Warrant Squad or Narcotics that span different precincts. An officer might report to a numbered precinct, but their command is Brooklyn Narcotics, and they are interacting with other officers in their unit more than the precinct they work out of. Our data reflects this.
Analysis
Overview exploration / metadata
The source dataset is an 81.2MB excel file that I received as FOIL2021-00167_Dataset.xlsx
. It has 3 tabs. The first has some general notes1 about the dataset.
The first tab has the title of OfficerAllegationHistory
and has 181,627 entries and 472 columns.
The second tab has the title of OfficersInvolvedInComplaints
and has 239,608 entries and 183 columns.
Analyzing our data with Datasette / SQLite
Once we convert our CSV files into SQLite .db
files we can use Datasette to get a sense of the data and slice off pieces for further analysis.
The first thing we might want to look at is the top commands that received complaints since 2010.
select [Officer Command At Incident], count([Officer Command At Incident]) from OfficerAllegationHistory
where [Incident Date] BETWEEN '2010-01-01' AND '2010-12-31'
group by [Officer Command At Incident] order by count([Officer Command At Incident]) DESC LIMIT 5
- 075 PCT: 320
- 046 PCT: 251
- 047 PCT: 226
- 120 PCT: 199
- BX IRT : 196
Or the top 10 commands whose complaints ended in penalties.
select [Officer Command At Incident], count([Officer Command At Incident]) from OfficerAllegationHistory
where [Incident Date] BETWEEN '2010-01-01' AND '2021-12-31'AND [CCRB Allegation Disposition] = 'Substantiated (Charges)'
group by [Officer Command At Incident] order by count([Officer Command At Incident]) DESC LIMIT 5
- WARRSEC: 91
- PBBX : 84
- 081 PCT: 71
- 079 PCT: 70
- 075 PCT: 68
After using Datasette to get a sense for the shape of our dataset, we can use it to filter out a slice of the data to use to feed into our next tool and begin doing our network analysis.
Ian Johnson helped me explore the data a bit in Observable as well.
We looked at when officers received complaints. Here each year is split into a bar. We are showing the officers who received the most complaints in the entire dataset.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036139/Screen_Shot_2021-05-26_at_11.56.10_AM.png)
Or stacking the bars- the darker they are, the more recent the complaints occurred.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036139/Screen_Shot_2021-05-26_at_12.03.47_PM.png)
You can also look at the CCRB outcomes and see how rarely cases are substantiated or end in discipline.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036139/Screen_Shot_2021-05-26_at_12.04.11_PM.png)
We also looked at which commands received the most complaints across the entire dataset.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036139/Screen_Shot_2021-05-26_at_12.03.54_PM.png)
Filtering out "Exonerated" and "Unfounded" complaints for our network
To get our network closer to a representation of officers who are receiving complaints for misconduct we want to filter out any of the cases in which the officer was Exonerated
or the CCRB's disposition was that it was Unfounded.
So to get every complaint filed since 2010 that wasn't marked as exonerated
or unfounded
we'll write a query like
select * from OfficerAllegationHistory
where ([Incident Date] BETWEEN '2010-01-01' AND '2021-12-31') AND ([CCRB Allegation Disposition] IS NOT 'Exonerated'
AND [CCRB Allegation Disposition] IS NOT 'Unfounded')
Which gives us a slightly more manageable 65,401 rows. We'll save these results off as a .csv for further analysis.
I had to do some funky stuff3 to fix the dates in SQLite, but once I did that, it was easy to filter by the date column.
We can also export new CSVs for all of the incidents since 2010 for a particular precinct:
select * from OfficerAllegationHistory
where [Incident Date] BETWEEN '2010-01-01' AND '2021-12-31' AND [Officer Command At Incident] = '075 PCT'
These precinct-specific slices are a bit smaller and more manageable for analysis in Observable.
We'll also want to export another CSV with ONLY complaints that were substantiated
select * from OfficerAllegationHistory
where ([Incident Date] BETWEEN '2010-01-01' AND '2021-12-31') AND ([CCRB Allegation Disposition] IS 'Substantiated (Charges)' OR [CCRB Allegation Disposition] IS 'Substantiated (Command Discipline A)'
OR [CCRB Allegation Disposition] IS 'Substantiated (Command Discipline B)'
Analyzing our data with Neo4J
I first encountered Neo4J when I was working with Ben Popken on an NBC News analysis of tweets tied to Senate Intelligence-identified Russian Twitter Bots where Neo4J provided analysts who were crucial to understanding the shape of our data.
It is an incredibly useful tool for generating and analyzing networks, and I was excited to have another dataset that would let me use its considerable power.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624506093/Screen_Shot_2021-05-31_at_12.05.00_PM.jpg)
Importing our CSV with Cypher
To import our .csv
into a network of node and relationships in Neo4J, we will use the Cypher query language, which makes this process really easy and the code is relatively readable and easy to follow.
Special thanks to David Allen at Neo4J for his guidance in writing queries and designing these relationships.
Basically we take the CSV files we exported from Datasette (when we filtered our Exonerated, Unfounded, and everything before 2010) and go through every row and push it into our network.
Creating officer nodes
First we tell Neo4J to use officer.id as a unique constraint (this makes things faster, I think?) and create a node for each officer from one CSV.
CREATE CONSTRAINT officerIdConstraint ON (officer:Officer) ASSERT officer.id IS UNIQUE
Then I loop through every line of the .csv and create a new Officer node for every new officer I see. I use MERGE
instead of CREATE
to make sure I don't duplicate officer nodes.
LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficersInvolvedInComplaints_FILTERED-SINCE2010-NOT-UNFOUNDED-EXONERATED.csv" AS csvLine
MERGE (officer:Officer {id: csvLine.`Unique Officer Id`, lastName: csvLine.`Officer Last Name`, firstName: csvLine.`Officer First Name`, OfficersInvolvedInComplaints: true})
RETURN officer
This creates 29,915 unique officer nodes.
Then we bind more data into it from our other CSV
LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficerAllegationHistory.csv" AS csvLine
MERGE (officer:Officer {id: csvLine.`Unique Officer Id`, lastName: csvLine.`Officer Last Name`, firstName: csvLine.`Officer First Name`, shieldNo: csvLine.`Shield No`, currentRank: csvLine.`Current Rank`, currentCommand: csvLine.`Current Command`, OfficerAllegationHistory: true})
RETURN officer
Now we have 113,265 unique officer nodes.
An officer node looks like this:
{
"identity": 00001,
"properties": {
"currentRank": "Police Officer",
"currentCommand": "81",
"lastName": "Smith",
"firstName": "John",
"shieldNo": "00001",
"id": "000001",
"OfficerAllegationHistory": true
}
}
Finally, we need to add a boolean to denote if an officer has ever had a charge substantiated. We'll use this later to please some lawyers. You'll see.
LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficerAllegationHistory_FILTERED-SINCE2010-SUBSTANTIATED.csv" AS csvLine
MERGE (officer:Officer {id: csvLine.`Unique Officer Id`})
SET officer.ccrbSubstantiatedBool = "true"
RETURN officer
Now we've marked 4,768 of New York's ~36,000 (13%) finest as having a substantiated complaint in the last 10 years.
Let's NOT do the same thing for OfficersInvolved - because that file contains officers who were merely witnesses to substantiated complaints, and we don't want to accidentally label a witness to a substantiated case.
Creating officer labels
Now we need to set labels for our nodes depending on whether they have ever had a complaint substantiated. We don't want to label nodes with names for any officers who may have complaints but have never had any substantiated. I have been told that lawyers think this is a good idea.
First we set every officer label to their unique ID
MATCH (o:Officer)
SET o.label = o.id
Then we look for officers with substantiated complaints and set their label to their full name.
MATCH (o:Officer {ccrbSubstantiatedBool: "true"})
SET o.label = COALESCE(o.firstName ,"") + ' ' + COALESCE(o.lastName ,"")
Now we have our officers created, we need to create our incidents.
Creating incident nodes
We are going to continue to use our CSV which filtered out incidents before 2010 or that were unfounded or exonerated.
First we tell Neo4J that we have unique incident IDs
CREATE CONSTRAINT incidentIdConstraint ON (incident:Incident) ASSERT incident.id IS UNIQUE
Then we create an incident for every row we see in OfficerAllegationHistory. We'll make note of the precinct the incident occurred in, what the specific allegation was, and what date the incident occurred.
LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficerAllegationHistory_FILTERED-SINCE2010-NOT-UNFOUNDED-EXONERATED.csv" AS csvLine
MERGE (incident:Incident {id: csvLine.`Complaint Id`})
SET incident.incidentPct = COALESCE(csvLine.`Precinct Of Incident Occurrence`,"N/A")
SET incident.ccrbDisposition = csvLine.`CCRB Allegation Disposition`
SET incident.allegation = csvLine.`Allegation`
set incident.date = csvLine.`Incident Date`
RETURN incident
Now let's do the same for OfficersInvolvedInComplaints.
LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficersInvolvedInComplaints_FILTERED-SINCE2010-NOT-UNFOUNDED-EXONERATED.csv" AS csvLine
MERGE (incident:Incident {id: csvLine.`Complaint Id`})
SET incident.ccrbDisposition = csvLine.`Complaint Disposition`
RETURN incident
Now let's create labels for our incidents, which is going to be the allegation.
MATCH (i:Incident)
SET i.label = i.allegation
Now that we have our Incidents and our Officers we need to create our relationships between them.
Creating relationships between incidents and officers
Now for the fun part.
We are going to create a new relationship called INVOLVED_IN
, and officers can be INVOLVED_IN
one or many incidents. Incidents may have one or many officers that were INVOLVED_IN
it, either as witness or subject officers.
First we create our relationships from OfficersInvolved:
LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficersInvolvedInComplaints_FILTERED-SINCE2010-NOT-UNFOUNDED-EXONERATED.csv" AS csvLine
MATCH (officer:Officer {id: csvLine.`Unique Officer Id`}) with csvLine, officer
MATCH (incident:Incident {id: csvLine.`Complaint Id`}) with csvLine, officer, incident
CREATE (officer)-[:INVOLVED_IN {status: csvLine.`Officer Status`}]->(incident)
Which creates 94,323 relationships.
Then from OfficerAllegationHistory:
LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficerAllegationHistory_FILTERED-SINCE2010-NOT-UNFOUNDED-EXONERATED.csv" AS csvLine
MATCH (officer:Officer {id: csvLine.`Unique Officer Id`}) with csvLine, officer
MATCH (incident:Incident {id: csvLine.`Complaint Id`}) with csvLine, officer, incident
CREATE (officer)-[:INVOLVED_IN {allegation: csvLine.`Allegation`, type: csvLine.`FADO Type`}]->(incident)
Now we have 159,671 relationships. Sick.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036165/Screen_Shot_2021-05-29_at_2.28.27_PM.png)
Next we can flatten out our graph a little bit and remove incidents if we want.
We will create a new type of relationship that only occurs between two officers called CO_OCCURANCE
- we will only make one of these between each officer, but the weight of that link will be decided by how many complaints those officers appear together on.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036231/Screen_Shot_2021-05-29_at_3.48.19_PM.png)
So officers who appear on 3 complaints together have a CO_OCCURANCE
relationship with a weight of 3. This allows us to do some weighted degree analysis when we are making our layout, deciding how large to make nodes, and when we are detecting communities.
MATCH (o1:Officer)-[:INVOLVED_IN]->(i:Incident)<-[:INVOLVED_IN]-(o2:Officer) WHERE id(o1)<id(o2) with o1, o2, count(i) as weightCount CREATE (o1)-[:CO_OCCURANCE { weight: weightCount }]->(o2)
Officers without connections
MATCH (o:Officer) WHERE NOT (o)-[:CO_OCCURANCE]-() RETURN count(o)
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624506696/Screen_Shot_2021-06-13_at_7.16.11_PM.jpg)
Eigenvector analysis on our network
Let's run a standard centrality analysis algorithm called "Eigenvector Centrality"
CALL gds.alpha.eigenvector.write({
nodeProjection: 'Officer',
relationshipProjection: 'CO_OCCURANCE',
relationshipProperties: 'weight',
relationshipWeightProperty: 'weight',
writeProperty: 'eigenvector'
})
YIELD nodes, iterations, dampingFactor, writeProperty
Now every Officer node has an eigenvector
value that represents its centrality across our entire NYPD-wide network. The larger the value, the more central that node is.
Analyzing our data with Gephi
Neo4J is cool for processing and analyzing tons of data, but I want to draw thousands of circles and lines now and start untangling the hairball of our network.
I am going to use Gephi, which I have a love-hate relationship with, but is unrivaled when it comes to network visualization. Plus, I already know how to use it.
We are going to stream our data from Neo4J to Gephi in order to leverage Neo4J's power to handle huge amounts of data (way more than Gephi) but still get to use Gephi's layout algorithms and analysis techniques.
Flattened co-occurance network
To get our flattened network, which removed incident nodes:
MATCH path=(o1:Officer)-[r:CO_OCCURANCE]->(o2:Officer) WITH o1, path limit 125000 with o1, collect(path) as paths call apoc.gephi.add(null,'workspace1', paths, 'weight', ['weight', 'id', 'eigenvector', 'firstName', 'lastName', 'label', 'date', 'currentCommand']) yield nodes, relationships, time return nodes, relationships, time ORDER BY o1.eigenvector DESC
This streams 100,000 edges and 25,064 nodes into Gephi.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036180/Screen_Shot_2021-05-29_at_3.03.54_PM.png)
We'll run the Force Atlas 2 layout algorithm in Gephi to have the nodes arrange themselves into some sort of sense.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036176/Screen_Shot_2021-05-29_at_2.51.20_PM.png)
We can use Gephi's modularity algorithm to color by "community" within our network, and we'll tweak our layout algorithm to separate things out a bit.
The communities that Gephi detects often mirror real-world precincts. As one might expect, officers appear on complaints with other officers in their precinct because they are working together most often.
I like that the algorithm detects communities that resemble precincts, and it actually gives me confidence that the community detection is working.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036310/Screen_Shot_2021-05-31_at_12.51.58_PM.png)
Then we can add some labels and we've made a map of the network of officers who appeared on CCRB complaints together, and there appear to be a number of closely-knit clusters and different communities within our network.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036311/Screen_Shot_2021-05-31_at_12.59.34_PM.png)
Precinct-specific networks including incidents (un-flattened network)
Let's put it all together and stream all the officers from a single precinct using only incidents since 2010.
MATCH path=(o1:Officer)-[:INVOLVED_IN]->(i:Incident {incidentPct: "75"})<-[:INVOLVED_IN]-(o2:Officer)
where i.date IS NOT NULL and apoc.date.parse(i.date, "ms", 'YYYY-mm-dd') > 1262304000000
WITH o1, path, i limit 100000 with o1, i, collect(path) as paths call apoc.gephi.add(null,'workspace1', paths, 'weight', ['weight', 'id', 'eigenvector', 'firstName', 'lastName', 'date']) yield nodes, relationships, time return nodes, relationships, time ORDER BY o1.eigenvector DESC
We can also make a network that is un-flattened, that is, we can see nodes for both officers as well as incidents. This way, we can see the patterns in the way incidents tie officers together.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036336/Screen_Shot_2021-06-02_at_6.32.45_PM.png)
Every blue circle here is an incident, and every red circle is an officer (both filtered by the 75th precinct, since 2010).
Now let's size the circles by their degree (the number of connections they have) and run a layout algorithm.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036346/Screen_Shot_2021-06-02_at_6.53.13_PM.png)
Some big nodes start to pop up, like Martinez, Radoncic, and Grieco. What is causing these officers to co-appear on so many different complaints with so many different officers?
When exploring this network, large nodes pop up, and I became naturally curious what the careers of those officers looked like. One of the first ones that jumped out to me was a big node that represented an officer named Nicholas Rios.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036085/Screen_Shot_2021-05-18_at_8.55.24_PM.png)
I googled his name, and one of the first results was a harrowing story of a federal civil rights case that named him.
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036085/Screen_Shot_2021-05-18_at_8.55.38_PM.png)
In the reporting that George Joseph did for Gothamist/WNYC, he found that a number of large influencers that appear in this network analysis also acted as influencers on the ground.
He looked at another large node in the network, Adnan Radoncic:
Atunbi asserted that Radoncic was a catalyst for a group assault on the street that day. “As soon as he grabbed me, all the officers was hands on,” he said. “It’s like they just followed his lead.”
Potential Next Steps
There are a few different directions for further analysis that I didn't have time for, but may result in interesting findings.
- Explorable NYPD-wide network
- Looking at protest complaints
- Looking at veterans influencing rookies
- Looking at the effects of NYPD discipline
- Officer career-specific visualization
- Analysis of length/outcomes of CCRB investigations
- Geographic analysis
Flotsam & Jetsam
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624505603/Screen_Shot_2021-06-23_at_9.54.21_PM.png)
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624505843/Screen_Shot_2021-06-21_at_7.12.27_PM.png)
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036381/Screen_Shot_2021-06-07_at_1.15.54_PM.png)
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624506217/Screen_Shot_2021-05-29_at_12.37.37_PM.png)
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036457/Screen_Shot_2021-06-13_at_10.31.55_PM.png)
![](https://res.cloudinary.com/ejf/image/upload/fl_progressive:semi,c_scale,dpr_auto,w_640/v1624036455/Screen_Shot_2021-06-13_at_10.33.01_PM.png)
Hire me to do work like this
I do freelance data exploration and visualization for clients who aren't evil. If you'd like to hire me to take a look at a dataset for you, just get in touch at [email protected]
Footnotes
- The notes say, basically: these are complaints received in or after the year 2000. Cases that are mediated or were attempted to be mediated are excluded. ↩
- OfficerAllegationHistory columns:
1: As Of Date
2: Allegation Record Identity
3: Unique Officer Id
4: Active Per Last Reported Status
5: Last Reported Active Date
6: Officer First Name
7: Officer Last Name
8: Officer Race
9: Officer Gender
10: Current Rank Abbreviation
11: Current Rank
12: Current Command
13: Shield No
14: Complaint Id
15: Incident Date
16: CCRB Received Date
17: Close Date
18: Officer Rank Abbreviation At Incident
19: Officer Rank At Incident
20: Officer Command At Incident
21: Officer Days On Force At Incident
22: Borough Of Incident Occurrence
23: Precinct Of Incident Occurrence
24: Location Type Of Incident
25: Reason for Police Contact
26: Outcome Of Police Encounter
27: Victim Age At Incident
28: Victim Race
29: Victim Gender
30: FADO Type
31: Allegation
32: CCRB Allegation Disposition
33: Board Discipline Recommendation
34: Non-APU Penalty Report Date
35: Officer Is APU
36: APU CCRB Trial Recommended Penalty
37: APU Trial Commissioner Recommended Penalty
38: APU Plea Agreed Penalty
39: APU Case Status
40: APU Closing Date
41: NYPD Allegation Disposition
42: NYPD Officer Penalty
43: Reconsideration Requested
44: Reconsideration Request Withdrawn
45: Reconsideration Request Rejected
46: Reconsideration Occurred
47: Reconsideration Decision Pending
:3 OfficersInvolvedInComplaints columns:↩1: As Of Date
2: Officer Status
3: Unique Officer Id
4: Active Per Last Reported Status
5: Last Reported Active Date
6: Officer First Name
7: Officer Last Name
8: Officer Race
9: Officer Gender
10: Current Rank Abbreviation
11: Current Rank
12: Current Command
13: Shield No
14: Complaint Id
15: Complaint Disposition
16: Incident Date
17: CCRB Recieved Date
18: Close Date
- I ran this in the CLI SQLite client to chop up the date string and re-write it in the way SQLite wants it:↩ ↩2 ↩3
update OfficerAllegationHistory set [Incident Date] = substr([Incident Date], -4) || '-' || substr('00' || ([Incident Date] + 0), -2, 2) || '-' || substr('00' || (substr([Incident Date], instr([Incident Date], '/') + 1) + 0), -2, 2);
The Lack of Data
If I wanted to, I could tell you the precise angle and speed of a small ball thrown in a game of professional baseball. I can tell you all sorts of information about each of the nearly 300 pitches thrown in any of the 162 games each baseball team plays each year.
As a person who cares about data, you can learn a lot by looking simply at what data is– and is not– collected. If I knew nothing else about human society besides the detail in which we gather data about each of our activities, I could wisely intuit that America seems to care a lot about baseball. Every detail of these gatherings, from each man's position on the field, to the number of people watching from the stands, is carefully noted and stored and archived for further use.
As a person who cares about data, when I want to learn more about something one of the first things I do is seek out the data that has been gathered about it. The amount of data being collected these days, as you may have heard, is reminiscent of oil. It is a commodity to be bought and sold, and the data surrounding our interests, desires, and willingness to part with our money are invisibly traded and are the cornerstones of America's largest businesses: Facebook, Google, Amazon, and a tumultuous school of would-be competitors following closely behind.
But when you go to look for data about, say, the number of people that the police have shot or killed in departments across America, there is an eery absence.
And once you have encountered one huge, important, absence of data you can't stop seeing them.
How many rape kits remain untested in America's largest cities? We don't really track that.
How many guns were registered per county in the last year? We don't really track that.
How many people were killed by police in the last year? Well, we didn't really track that until different newsrooms and non-profit organizations took it upon themselves to do the work the police should be doing, using their own resources to track and publish the statistics necessary to understand one of the biggest problems our society is facing.
It is certainly not because a lack of ability to track these things. If I drive by a house and want the data on how much it costs, the prices of all of it's previous sales, and it's score in various indices that rate it's proximity to schools, grocery stores, and public transit.
If I own a mall and I want to track the phone of every person who enters the door, I can rent a service that will do so. It tracks each person's device via Bluetooth or Wifi, and makes note of their path around the store and how long they stay in different sections. I might use this data to adjust the layout of my store to encourage more purchases.
We are able to track the things we care about to an incredible, awe-inspiring amount of detail. Sometimes, not always, that helps us understand the world around us and design ways to do things better, or at least differently, and see how the data changes.
But without a baseline, there is no way to determine if an intervention is successful or not. If we don't know a patient's cholesterol, we can't tell if the medicine we have given them is working or not. So we track those things and use all of that data to guide our actions.
#data #dataviz
Generative Art
Writing code that makes shapes move on a screen is a rewarding feeling.
It is a medium that lets you be surprised as an artist. I like making art where I can assemble tiny pieces and step back and be surprised at what I have made.
Working with certain materials have their own creative minds, like pottery or watercolors. You feel you are in a partnership with your material more than the master of it.
Making a creative partnership with computers is an incredibly pleasurable way to spend time, at least for me. Computers are incredibly powerful and have an infinite capacity to surprise. Finding new ways to tune into the computer's ability to contribute as creative partner and diviner of random numbers is a lot of fun for me.
Table of contents
My experiments in creative coding
I have never been able to create photo-realistic work when drawing or painting by hand. In drawing class growing up, I would find myself drawing an "icon" of the thing rather than of what it actually looked like.
My trees looked like robotic cutouts and nothing like an actual tree, organic and asymmetrical. I preferred to focus on outlines and silhouettes over shading. This played a part in my choice to pursue design over art as a teenager. Even when I pushed myself to focus more on lighting as a street photographer, I struggled to draw or paint what's in front of me in a literal way. I am still working on this through watercolors, focusing on organic shapes of plants has helped a lot.
Creating something with a computer as a partner often looks like creating a bunch of parameters that control different aspects: color, distance, size, direction and then handing those controls over to the computer to go wild with. From there it's a matter of tuning how much control to give over, and where.
My goal is always to be surprised by the work, and to come across something I could have never designed at the outset. It's a way to jam out and get into that flow of creation that is always eluding me.
417am1975: My Twitter art-bot
@417am1975 is a generative Twitter bot that tweeted randomized art every hour for a couple of years, running on a Heroku instance. He's currently taking a break.
I would create a new generative art script after work to unwind. I would save it into a directory that 417am would look at every time he wanted to make a tweet. He would run it with a new set of random numbers and tweet it out.
I liked this a lot because I would be surfing Twitter throughout the day and be pleasantly surprised with something I liked – something 417am had made based on one of my scripts. They were familiar but the special ones jumped out as something I never would have thought of.
Glitch Generative Audio Experiments
I made a series of little experiments on Glitch where I played with different javascript approaches to generative sound and music.
One of the things that I discovered is the importance of loops and repetition in what humans will consider music. I did a lot of approaches where the machine would just rumble forward, randomly picking the next note as it went, never repeating. It made some interesting and unusual sounding "music" but I never achieved what I set out to.
I think some next steps would be to use this as an engine and then provide some way for a user to "lock" the machine into its current parameters and loop it. Or even for the machine to have looping built into its DNA, which I did not experiment with much.
Javascript Generative Audio Tools
- Tone.js is a great tool for generating sounds in the browser, and you can create your own oscillators, FX chains and play back samples (plus adjust their speed, start, and end positions)
- Tonal is a library to help you do musical math, work with scales, chords, and more. This lets you play around in different common patterns of music without needing to become intricately familiar with it.
- Chance.js makes the experience of adding randomization to your project really pleasurable. I would feed it a musical scale I obtained from Tone.js and have it
.pick()
within that.
What I Want to Try More
I want to keep experimenting with more with different types of outputs.
Raspberry Pi / Installations
I experimented with using Processing in combination with a node script to create an automatic doomscroller that pulls the latest headlines from various news sources with a modified version of Brad Oyler's newsdivide. I tried to get a version of it running on a Raspberry Pi, but I ran into trouble capturing screenshots of the news sites with puppeteer on the raspberry pi – but I like the idea of creating a generative piece of art and running it in the background on a screen through the raspberry pi.
Audio Visualization
I've experimented in the past with the OS X VJ software CoGe which I really enjoyed for BPM-syncing effects and looping videos and GIFs. Unfortunately it doesn't run on Catalina and the new version hasn't been released yet. But as a fan of music and DJ'ing and live music creation, I think that creating visuals both for livestreaming and also, maybe, eventually, in-person parties is a really interesting outlet for generative art.
What I think the next 5-10 years will hold
Spatialization and Collaboration
I am excited to use shared online spaces to collaborate with friends and create generative art together. I want to create things where even a non-technical friend could come into a virtual space with a generative art piece, and start pulling knobs and levers and sliders and hitting buttons and seeing what the results are (and then save them out into a gallery for future visitors to explore).