June 11, 202123 photos 16 min read

Finding Clusters of NYPD Officers In CCRB Complaint Data

Why?

Complaints filed against police officers by the public are often the first and only warning sign that a cop might be on a course of escalating violence.

In the deaths of George Floyd and Eric Garner their killers had a documented history of complaints filed against them. Unfortunately nothing was done to disrupt their pattern of abuse, and both cases ended in those officers killing members of the public they had sworn to protect.

George Floyd

Chauvin, who was fired, has said through his attorney that his handling of Floyd’s arrest was a reasonable use of authorized force. But he was the subject of at least 22 complaints or internal investigations during his more than 19 years at the department, only one of which resulted in discipline. These new interviews show not only that he may have used excessive force in the past, but that he had used startlingly similar techniques. 'That could have been me': The people Derek Chauvin choked before George Floyd

The officer convicted of murdering George Floyd had at least 22 complaints against him. The officer who put Eric Garner in a chokehold and killed him had 7 complaints filed against him.

Eric Garner

Before he put Garner in the chokehold, the records show, he had seven disciplinary complaints and 14 individual allegations lodged against him. Four of those allegations were substantiated by an independent review board. The disturbing secret history of the NYPD officer who killed Eric Garner

Of the 14 individual allegations against Garner's killer, 5 are for force: "hit against inanimate object", "physical force", and a single complaint in 2014 that would foreshadow the behavior that would eventually end the Officer's career: "Force - Chokehold".

I am documenting my analysis in detail for a few reasons:

  • So that other people who may want to perform similar analysis for other Police Departments can understand and recreate my analysis
  • So that every step is documented, and any mistake can be easily caught and fixed by the infinite supply of people on the internet who are smarter than me
  • To maybe inspire people to use computers to investigate the things in the world that are important to them, and share the tools I use to do that

Network visualization prior work / inspiration

You may have seen network analysis like this before.

Jacob Silver used it to look at the spread of anti-vax material.

Adi Cohen has pioneered a method of combining Gephi with CrowdTangle to analyze the network of groups and pages sharing links.

Provenance

WNYC/Gothamist received the data in response to a Freedom of Information Law request and provided me an excel file for analysis.

The Dataset

Differences from data previously released by ProPublica

ProPublica released and covered similar data in July of 2020.

They chose to only publish data for "active-duty officers who’ve had at least one allegation against them substantiated by the CCRB".

The dataset we are working with today contains every complaint and officer, even those with no substantiated allegations.

It also contains officers who were listed as witnesses on complaints, including complaints found as "unsubstantiated" or "unfounded" by the CCRB.

This makes it a "noisier" dataset. In our case this can be an advantage since we are looking to visualize the network of officers.

Being named with another officer on a complaint, even if that complaint is unfounded, is a signal that those officers interacted in a way that was noticed by the public. Being the subject of an unfounded complaint together might even cause officers to form a tighter relationship. Because of that, I will incorporate witness data into our analysis.

NYPD internal structure as it relates to our data

The NYPD is divided into coverage areas within the 5 boroughs known as precincts. When I lived in Brooklyn, I lived in the 81st Precinct which covers Bed-Stuy.

The NYPD also has a number of units, like the Warrant Squad or Narcotics that span different precincts. An officer might report to a numbered precinct, but their command is Brooklyn Narcotics, and they are interacting with other officers in their unit more than the precinct they work out of. Our data reflects this.

Analysis

Overview exploration / metadata

The source dataset is an 81.2MB excel file that I received as FOIL2021-00167_Dataset.xlsx. It has 3 tabs. The first has some general notes1 about the dataset.

The first tab has the title of OfficerAllegationHistory and has 181,627 entries and 472 columns.

The second tab has the title of OfficersInvolvedInComplaints and has 239,608 entries and 183 columns.

Analyzing our data with Datasette / SQLite

Once we convert our CSV files into SQLite .db files we can use Datasette to get a sense of the data and slice off pieces for further analysis.

The first thing we might want to look at is the top commands that received complaints since 2010.

select [Officer Command At Incident], count([Officer Command At Incident]) from OfficerAllegationHistory
where [Incident Date] BETWEEN '2010-01-01' AND '2010-12-31'
group by [Officer Command At Incident] order by count([Officer Command At Incident]) DESC LIMIT 5
  • 075 PCT: 320
  • 046 PCT: 251
  • 047 PCT: 226
  • 120 PCT: 199
  • BX IRT : 196

Or the top 10 commands whose complaints ended in penalties.

select [Officer Command At Incident], count([Officer Command At Incident]) from OfficerAllegationHistory
where [Incident Date] BETWEEN '2010-01-01' AND '2021-12-31'AND [CCRB Allegation Disposition] = 'Substantiated (Charges)'
group by [Officer Command At Incident] order by count([Officer Command At Incident]) DESC LIMIT 5
  • WARRSEC: 91
  • PBBX : 84
  • 081 PCT: 71
  • 079 PCT: 70
  • 075 PCT: 68

After using Datasette to get a sense for the shape of our dataset, we can use it to filter out a slice of the data to use to feed into our next tool and begin doing our network analysis.

Ian Johnson helped me explore the data a bit in Observable as well.

We looked at when officers received complaints. Here each year is split into a bar. We are showing the officers who received the most complaints in the entire dataset.

Or stacking the bars- the darker they are, the more recent the complaints occurred.

You can also look at the CCRB outcomes and see how rarely cases are substantiated or end in discipline.

We also looked at which commands received the most complaints across the entire dataset.

Filtering out "Exonerated" and "Unfounded" complaints for our network

To get our network closer to a representation of officers who are receiving complaints for misconduct we want to filter out any of the cases in which the officer was Exonerated or the CCRB's disposition was that it was Unfounded.

So to get every complaint filed since 2010 that wasn't marked as exonerated or unfounded we'll write a query like

select * from OfficerAllegationHistory
where ([Incident Date] BETWEEN '2010-01-01' AND '2021-12-31') AND ([CCRB Allegation Disposition] IS NOT 'Exonerated'
AND [CCRB Allegation Disposition] IS NOT 'Unfounded')

Which gives us a slightly more manageable 65,401 rows. We'll save these results off as a .csv for further analysis.

I had to do some funky stuff3 to fix the dates in SQLite, but once I did that, it was easy to filter by the date column.

We can also export new CSVs for all of the incidents since 2010 for a particular precinct:

select * from OfficerAllegationHistory
where [Incident Date] BETWEEN '2010-01-01' AND '2021-12-31' AND [Officer Command At Incident] = '075 PCT'

These precinct-specific slices are a bit smaller and more manageable for analysis in Observable.

We'll also want to export another CSV with ONLY complaints that were substantiated

select * from OfficerAllegationHistory
where ([Incident Date] BETWEEN '2010-01-01' AND '2021-12-31') AND ([CCRB Allegation Disposition] IS 'Substantiated (Charges)' OR [CCRB Allegation Disposition] IS 'Substantiated (Command Discipline A)'
OR [CCRB Allegation Disposition] IS 'Substantiated (Command Discipline B)'

Analyzing our data with Neo4J

I first encountered Neo4J when I was working with Ben Popken on an NBC News analysis of tweets tied to Senate Intelligence-identified Russian Twitter Bots where Neo4J provided analysts who were crucial to understanding the shape of our data.

It is an incredibly useful tool for generating and analyzing networks, and I was excited to have another dataset that would let me use its considerable power.

Importing our CSV with Cypher

To import our .csv into a network of node and relationships in Neo4J, we will use the Cypher query language, which makes this process really easy and the code is relatively readable and easy to follow.

Special thanks to David Allen at Neo4J for his guidance in writing queries and designing these relationships.

Basically we take the CSV files we exported from Datasette (when we filtered our Exonerated, Unfounded, and everything before 2010) and go through every row and push it into our network.

Creating officer nodes

First we tell Neo4J to use officer.id as a unique constraint (this makes things faster, I think?) and create a node for each officer from one CSV.

CREATE CONSTRAINT officerIdConstraint ON (officer:Officer) ASSERT officer.id IS UNIQUE

Then I loop through every line of the .csv and create a new Officer node for every new officer I see. I use MERGE instead of CREATE to make sure I don't duplicate officer nodes.

LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficersInvolvedInComplaints_FILTERED-SINCE2010-NOT-UNFOUNDED-EXONERATED.csv" AS csvLine
MERGE (officer:Officer {id: csvLine.`Unique Officer Id`, lastName: csvLine.`Officer Last Name`, firstName: csvLine.`Officer First Name`, OfficersInvolvedInComplaints: true})
RETURN officer

This creates 29,915 unique officer nodes.

Then we bind more data into it from our other CSV

LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficerAllegationHistory.csv" AS csvLine
MERGE (officer:Officer {id: csvLine.`Unique Officer Id`, lastName: csvLine.`Officer Last Name`, firstName: csvLine.`Officer First Name`, shieldNo: csvLine.`Shield No`, currentRank: csvLine.`Current Rank`, currentCommand: csvLine.`Current Command`, OfficerAllegationHistory: true})
RETURN officer

Now we have 113,265 unique officer nodes.

An officer node looks like this:

{
  "identity": 00001,
  "properties": {
    "currentRank": "Police Officer",
    "currentCommand": "81",
    "lastName": "Smith",
    "firstName": "John",
    "shieldNo": "00001",
    "id": "000001",
    "OfficerAllegationHistory": true
  }
}

Finally, we need to add a boolean to denote if an officer has ever had a charge substantiated. We'll use this later to please some lawyers. You'll see.

LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficerAllegationHistory_FILTERED-SINCE2010-SUBSTANTIATED.csv" AS csvLine
MERGE (officer:Officer {id: csvLine.`Unique Officer Id`})
SET officer.ccrbSubstantiatedBool = "true"
RETURN officer

Now we've marked 4,768 of New York's ~36,000 (13%) finest as having a substantiated complaint in the last 10 years.

Let's NOT do the same thing for OfficersInvolved - because that file contains officers who were merely witnesses to substantiated complaints, and we don't want to accidentally label a witness to a substantiated case.

Creating officer labels

Now we need to set labels for our nodes depending on whether they have ever had a complaint substantiated. We don't want to label nodes with names for any officers who may have complaints but have never had any substantiated. I have been told that lawyers think this is a good idea.

First we set every officer label to their unique ID

MATCH (o:Officer)
SET o.label = o.id

Then we look for officers with substantiated complaints and set their label to their full name.

MATCH (o:Officer {ccrbSubstantiatedBool: "true"})
SET o.label = COALESCE(o.firstName ,"") + ' ' + COALESCE(o.lastName ,"")

Now we have our officers created, we need to create our incidents.

Creating incident nodes

We are going to continue to use our CSV which filtered out incidents before 2010 or that were unfounded or exonerated.

First we tell Neo4J that we have unique incident IDs

CREATE CONSTRAINT incidentIdConstraint ON (incident:Incident) ASSERT incident.id IS UNIQUE

Then we create an incident for every row we see in OfficerAllegationHistory. We'll make note of the precinct the incident occurred in, what the specific allegation was, and what date the incident occurred.

LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficerAllegationHistory_FILTERED-SINCE2010-NOT-UNFOUNDED-EXONERATED.csv" AS csvLine
MERGE (incident:Incident {id: csvLine.`Complaint Id`})
SET incident.incidentPct = COALESCE(csvLine.`Precinct Of Incident Occurrence`,"N/A")
SET incident.ccrbDisposition = csvLine.`CCRB Allegation Disposition`
SET incident.allegation = csvLine.`Allegation`
set incident.date = csvLine.`Incident Date`
RETURN incident

Now let's do the same for OfficersInvolvedInComplaints.

LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficersInvolvedInComplaints_FILTERED-SINCE2010-NOT-UNFOUNDED-EXONERATED.csv" AS csvLine
MERGE (incident:Incident {id: csvLine.`Complaint Id`})
SET incident.ccrbDisposition = csvLine.`Complaint Disposition`
RETURN incident

Now let's create labels for our incidents, which is going to be the allegation.

MATCH (i:Incident)
SET i.label = i.allegation

Now that we have our Incidents and our Officers we need to create our relationships between them.

Creating relationships between incidents and officers

Now for the fun part.

We are going to create a new relationship called INVOLVED_IN, and officers can be INVOLVED_IN one or many incidents. Incidents may have one or many officers that were INVOLVED_IN it, either as witness or subject officers.

First we create our relationships from OfficersInvolved:

LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficersInvolvedInComplaints_FILTERED-SINCE2010-NOT-UNFOUNDED-EXONERATED.csv" AS csvLine
MATCH (officer:Officer {id: csvLine.`Unique Officer Id`}) with csvLine, officer
MATCH (incident:Incident {id: csvLine.`Complaint Id`}) with csvLine, officer, incident
CREATE (officer)-[:INVOLVED_IN {status: csvLine.`Officer Status`}]->(incident)

Which creates 94,323 relationships.

Then from OfficerAllegationHistory:

LOAD CSV WITH HEADERS FROM "http://localhost:11001/project-185bb75c-b944-451a-9c0f-aeba860ae68a/OfficerAllegationHistory_FILTERED-SINCE2010-NOT-UNFOUNDED-EXONERATED.csv" AS csvLine
MATCH (officer:Officer {id: csvLine.`Unique Officer Id`}) with csvLine, officer
MATCH (incident:Incident {id: csvLine.`Complaint Id`}) with csvLine, officer, incident
CREATE (officer)-[:INVOLVED_IN {allegation: csvLine.`Allegation`, type: csvLine.`FADO Type`}]->(incident)

Now we have 159,671 relationships. Sick.

Next we can flatten out our graph a little bit and remove incidents if we want.

We will create a new type of relationship that only occurs between two officers called CO_OCCURANCE - we will only make one of these between each officer, but the weight of that link will be decided by how many complaints those officers appear together on.

So officers who appear on 3 complaints together have a CO_OCCURANCE relationship with a weight of 3. This allows us to do some weighted degree analysis when we are making our layout, deciding how large to make nodes, and when we are detecting communities.

MATCH (o1:Officer)-[:INVOLVED_IN]->(i:Incident)<-[:INVOLVED_IN]-(o2:Officer) WHERE id(o1)<id(o2) with o1, o2, count(i) as weightCount CREATE (o1)-[:CO_OCCURANCE { weight: weightCount }]->(o2)

Officers without connections

MATCH (o:Officer) WHERE NOT (o)-[:CO_OCCURANCE]-() RETURN count(o)

Eigenvector analysis on our network

Let's run a standard centrality analysis algorithm called "Eigenvector Centrality"

CALL gds.alpha.eigenvector.write({
  nodeProjection: 'Officer',
  relationshipProjection: 'CO_OCCURANCE',
  relationshipProperties: 'weight',
  relationshipWeightProperty: 'weight',
  writeProperty: 'eigenvector'
})
YIELD nodes, iterations, dampingFactor, writeProperty

Now every Officer node has an eigenvector value that represents its centrality across our entire NYPD-wide network. The larger the value, the more central that node is.

Analyzing our data with Gephi

Neo4J is cool for processing and analyzing tons of data, but I want to draw thousands of circles and lines now and start untangling the hairball of our network.

I am going to use Gephi, which I have a love-hate relationship with, but is unrivaled when it comes to network visualization. Plus, I already know how to use it.

We are going to stream our data from Neo4J to Gephi in order to leverage Neo4J's power to handle huge amounts of data (way more than Gephi) but still get to use Gephi's layout algorithms and analysis techniques.

Flattened co-occurance network

To get our flattened network, which removed incident nodes:

MATCH path=(o1:Officer)-[r:CO_OCCURANCE]->(o2:Officer) WITH o1, path limit 125000  with o1, collect(path) as paths call apoc.gephi.add(null,'workspace1', paths, 'weight', ['weight', 'id', 'eigenvector', 'firstName', 'lastName', 'label', 'date', 'currentCommand']) yield nodes, relationships, time return nodes, relationships, time ORDER  BY o1.eigenvector DESC

This streams 100,000 edges and 25,064 nodes into Gephi.

We'll run the Force Atlas 2 layout algorithm in Gephi to have the nodes arrange themselves into some sort of sense.

We can use Gephi's modularity algorithm to color by "community" within our network, and we'll tweak our layout algorithm to separate things out a bit.

The communities that Gephi detects often mirror real-world precincts. As one might expect, officers appear on complaints with other officers in their precinct because they are working together most often.

I like that the algorithm detects communities that resemble precincts, and it actually gives me confidence that the community detection is working.

Then we can add some labels and we've made a map of the network of officers who appeared on CCRB complaints together, and there appear to be a number of closely-knit clusters and different communities within our network.

Precinct-specific networks including incidents (un-flattened network)

Let's put it all together and stream all the officers from a single precinct using only incidents since 2010.

MATCH path=(o1:Officer)-[:INVOLVED_IN]->(i:Incident {incidentPct: "75"})<-[:INVOLVED_IN]-(o2:Officer)
where i.date IS NOT NULL and apoc.date.parse(i.date, "ms", 'YYYY-mm-dd') > 1262304000000
 WITH o1, path, i limit 100000  with o1, i, collect(path) as paths call apoc.gephi.add(null,'workspace1', paths, 'weight', ['weight', 'id', 'eigenvector', 'firstName', 'lastName', 'date']) yield nodes, relationships, time return nodes, relationships, time ORDER  BY o1.eigenvector DESC

We can also make a network that is un-flattened, that is, we can see nodes for both officers as well as incidents. This way, we can see the patterns in the way incidents tie officers together.

Every blue circle here is an incident, and every red circle is an officer (both filtered by the 75th precinct, since 2010).

Now let's size the circles by their degree (the number of connections they have) and run a layout algorithm.

Some big nodes start to pop up, like Martinez, Radoncic, and Grieco. What is causing these officers to co-appear on so many different complaints with so many different officers?

When exploring this network, large nodes pop up, and I became naturally curious what the careers of those officers looked like. One of the first ones that jumped out to me was a big node that represented an officer named Nicholas Rios.

I googled his name, and one of the first results was a harrowing story of a federal civil rights case that named him.

In the reporting that George Joseph did for Gothamist/WNYC, he found that a number of large influencers that appear in this network analysis also acted as influencers on the ground.

He looked at another large node in the network, Adnan Radoncic:

Atunbi asserted that Radoncic was a catalyst for a group assault on the street that day. “As soon as he grabbed me, all the officers was hands on,” he said. “It’s like they just followed his lead.”

Potential Next Steps

There are a few different directions for further analysis that I didn't have time for, but may result in interesting findings.

  • Explorable NYPD-wide network
  • Looking at protest complaints
  • Looking at veterans influencing rookies
  • Looking at the effects of NYPD discipline
  • Officer career-specific visualization
  • Analysis of length/outcomes of CCRB investigations
  • Geographic analysis

Flotsam & Jetsam

Hire me to do work like this

I do freelance data exploration and visualization for clients who aren't evil. If you'd like to hire me to take a look at a dataset for you, just get in touch at [email protected]

Footnotes

  1. The notes say, basically: these are complaints received in or after the year 2000. Cases that are mediated or were attempted to be mediated are excluded.
  2. OfficerAllegationHistory columns:

    1: As Of Date

    2: Allegation Record Identity

    3: Unique Officer Id

    4: Active Per Last Reported Status

    5: Last Reported Active Date

    6: Officer First Name

    7: Officer Last Name

    8: Officer Race

    9: Officer Gender

    10: Current Rank Abbreviation

    11: Current Rank

    12: Current Command

    13: Shield No

    14: Complaint Id

    15: Incident Date

    16: CCRB Received Date

    17: Close Date

    18: Officer Rank Abbreviation At Incident

    19: Officer Rank At Incident

    20: Officer Command At Incident

    21: Officer Days On Force At Incident

    22: Borough Of Incident Occurrence

    23: Precinct Of Incident Occurrence

    24: Location Type Of Incident

    25: Reason for Police Contact

    26: Outcome Of Police Encounter

    27: Victim Age At Incident

    28: Victim Race

    29: Victim Gender

    30: FADO Type

    31: Allegation

    32: CCRB Allegation Disposition

    33: Board Discipline Recommendation

    34: Non-APU Penalty Report Date

    35: Officer Is APU

    36: APU CCRB Trial Recommended Penalty

    37: APU Trial Commissioner Recommended Penalty

    38: APU Plea Agreed Penalty

    39: APU Case Status

    40: APU Closing Date

    41: NYPD Allegation Disposition

    42: NYPD Officer Penalty

    43: Reconsideration Requested

    44: Reconsideration Request Withdrawn

    45: Reconsideration Request Rejected

    46: Reconsideration Occurred

    47: Reconsideration Decision Pending


    :3 OfficersInvolvedInComplaints columns:

    1: As Of Date

    2: Officer Status

    3: Unique Officer Id

    4: Active Per Last Reported Status

    5: Last Reported Active Date

    6: Officer First Name

    7: Officer Last Name

    8: Officer Race

    9: Officer Gender

    10: Current Rank Abbreviation

    11: Current Rank

    12: Current Command

    13: Shield No

    14: Complaint Id

    15: Complaint Disposition

    16: Incident Date

    17: CCRB Recieved Date

    18: Close Date

  3. I ran this in the CLI SQLite client to chop up the date string and re-write it in the way SQLite wants it:
    update OfficerAllegationHistory
    set [Incident Date] = substr([Incident Date], -4) || '-' ||
    substr('00' || ([Incident Date] + 0), -2, 2) || '-' ||
    substr('00' || (substr([Incident Date], instr([Incident Date], '/') + 1) + 0), -2, 2);
    
    2 3
May 17, 2021 3 min read

The Lack of Data

If I wanted to, I could tell you the precise angle and speed of a small ball thrown in a game of professional baseball. I can tell you all sorts of information about each of the nearly 300 pitches thrown in any of the 162 games each baseball team plays each year.

As a person who cares about data, you can learn a lot by looking simply at what data is– and is not– collected. If I knew nothing else about human society besides the detail in which we gather data about each of our activities, I could wisely intuit that America seems to care a lot about baseball. Every detail of these gatherings, from each man's position on the field, to the number of people watching from the stands, is carefully noted and stored and archived for further use.

As a person who cares about data, when I want to learn more about something one of the first things I do is seek out the data that has been gathered about it. The amount of data being collected these days, as you may have heard, is reminiscent of oil. It is a commodity to be bought and sold, and the data surrounding our interests, desires, and willingness to part with our money are invisibly traded and are the cornerstones of America's largest businesses: Facebook, Google, Amazon, and a tumultuous school of would-be competitors following closely behind.

But when you go to look for data about, say, the number of people that the police have shot or killed in departments across America, there is an eery absence.

And once you have encountered one huge, important, absence of data you can't stop seeing them.

How many rape kits remain untested in America's largest cities? We don't really track that.

How many guns were registered per county in the last year? We don't really track that.

How many people were killed by police in the last year? Well, we didn't really track that until different newsrooms and non-profit organizations took it upon themselves to do the work the police should be doing, using their own resources to track and publish the statistics necessary to understand one of the biggest problems our society is facing.

It is certainly not because a lack of ability to track these things. If I drive by a house and want the data on how much it costs, the prices of all of it's previous sales, and it's score in various indices that rate it's proximity to schools, grocery stores, and public transit.

If I own a mall and I want to track the phone of every person who enters the door, I can rent a service that will do so. It tracks each person's device via Bluetooth or Wifi, and makes note of their path around the store and how long they stay in different sections. I might use this data to adjust the layout of my store to encourage more purchases.

We are able to track the things we care about to an incredible, awe-inspiring amount of detail. Sometimes, not always, that helps us understand the world around us and design ways to do things better, or at least differently, and see how the data changes.

But without a baseline, there is no way to determine if an intervention is successful or not. If we don't know a patient's cholesterol, we can't tell if the medicine we have given them is working or not. So we track those things and use all of that data to guide our actions.

#data #dataviz

February 10, 2021 5 min read

Generative Art

Writing code that makes shapes move on a screen is a rewarding feeling.

It is a medium that lets you be surprised as an artist. I like making art where I can assemble tiny pieces and step back and be surprised at what I have made.

Working with certain materials have their own creative minds, like pottery or watercolors. You feel you are in a partnership with your material more than the master of it.

Making a creative partnership with computers is an incredibly pleasurable way to spend time, at least for me. Computers are incredibly powerful and have an infinite capacity to surprise. Finding new ways to tune into the computer's ability to contribute as creative partner and diviner of random numbers is a lot of fun for me.

My experiments in creative coding

I have never been able to create photo-realistic work when drawing or painting by hand. In drawing class growing up, I would find myself drawing an "icon" of the thing rather than of what it actually looked like.

My trees looked like robotic cutouts and nothing like an actual tree, organic and asymmetrical. I preferred to focus on outlines and silhouettes over shading. This played a part in my choice to pursue design over art as a teenager. Even when I pushed myself to focus more on lighting as a street photographer, I struggled to draw or paint what's in front of me in a literal way. I am still working on this through watercolors, focusing on organic shapes of plants has helped a lot.

Creating something with a computer as a partner often looks like creating a bunch of parameters that control different aspects: color, distance, size, direction and then handing those controls over to the computer to go wild with. From there it's a matter of tuning how much control to give over, and where.

My goal is always to be surprised by the work, and to come across something I could have never designed at the outset. It's a way to jam out and get into that flow of creation that is always eluding me.

417am1975: My Twitter art-bot

@417am1975 is a generative Twitter bot that tweeted randomized art every hour for a couple of years, running on a Heroku instance. He's currently taking a break.

I would create a new generative art script after work to unwind. I would save it into a directory that 417am would look at every time he wanted to make a tweet. He would run it with a new set of random numbers and tweet it out.

I liked this a lot because I would be surfing Twitter throughout the day and be pleasantly surprised with something I liked – something 417am had made based on one of my scripts. They were familiar but the special ones jumped out as something I never would have thought of.

Glitch Generative Audio Experiments

I made a series of little experiments on Glitch where I played with different javascript approaches to generative sound and music.

One of the things that I discovered is the importance of loops and repetition in what humans will consider music. I did a lot of approaches where the machine would just rumble forward, randomly picking the next note as it went, never repeating. It made some interesting and unusual sounding "music" but I never achieved what I set out to.

I think some next steps would be to use this as an engine and then provide some way for a user to "lock" the machine into its current parameters and loop it. Or even for the machine to have looping built into its DNA, which I did not experiment with much.

Javascript Generative Audio Tools
  • Tone.js is a great tool for generating sounds in the browser, and you can create your own oscillators, FX chains and play back samples (plus adjust their speed, start, and end positions)
  • Tonal is a library to help you do musical math, work with scales, chords, and more. This lets you play around in different common patterns of music without needing to become intricately familiar with it.
  • Chance.js makes the experience of adding randomization to your project really pleasurable. I would feed it a musical scale I obtained from Tone.js and have it .pick() within that.

What I Want to Try More

I want to keep experimenting with more with different types of outputs.

Raspberry Pi / Installations

I experimented with using Processing in combination with a node script to create an automatic doomscroller that pulls the latest headlines from various news sources with a modified version of Brad Oyler's newsdivide. I tried to get a version of it running on a Raspberry Pi, but I ran into trouble capturing screenshots of the news sites with puppeteer on the raspberry pi – but I like the idea of creating a generative piece of art and running it in the background on a screen through the raspberry pi.

Audio Visualization

I've experimented in the past with the OS X VJ software CoGe which I really enjoyed for BPM-syncing effects and looping videos and GIFs. Unfortunately it doesn't run on Catalina and the new version hasn't been released yet. But as a fan of music and DJ'ing and live music creation, I think that creating visuals both for livestreaming and also, maybe, eventually, in-person parties is a really interesting outlet for generative art.

What I think the next 5-10 years will hold

Spatialization and Collaboration

I am excited to use shared online spaces to collaborate with friends and create generative art together. I want to create things where even a non-technical friend could come into a virtual space with a generative art piece, and start pulling knobs and levers and sliders and hitting buttons and seeing what the results are (and then save them out into a gallery for future visitors to explore).