Long Gun Registry Data: What’s in the scraps?

Posted: October 13, 2015 in Firearms News

Back on May 15th Global News had a piece that included a downloadable link to scraps of largely redacted LGR data. I downloaded the file on a whim and have been sitting on it, thinking about what could be done with it’s contents.  As I write, I can’t see this being a one-of article, but rather the first of a series recounting what is uncovered in it’s contents.

The file is of tab delimited format, is malformed and generally too large for most people to deal with.  It’s been sitting in my download folder for months now and only recently have I had a chance to play around with it a bit.

Sitting in a flat file is pretty useless, so as a first task, I created a MySQL schema and a table to contain the initial data-set.  The file contains 8,016,342 clean rows, and at a quick glance, near 100 that are malformed feeding next row records into fields on a trouble row, like the line breaks aren’t in the right spot.  Being statistically insignificant, I’m just going to ignore the corrupt rows for now.

As the file is so large, I just exported a few rows from the top to cast what I thought would be reasonable data-types for my first go at this table.  The rows are structured into 16 tab delimited columns

  1. Make of the firearm
  2. Model
  3. Manufacturer
  4. Type (One of:  Rifle, Shotgun, Handgun, Commercial Version, Other, Combination Gun, Submachine Gun, Not Available, Machine Gun, Grenade Launcher, Mortar, BC, AIR PISTOL, , ON, Cannon)
  5. Action (One of: Semi-Automatic, Bolt Action, Pump Action, Lever Action, Hinge Break, Single Shot, Revolver, Multi Barrel, Air, spring or gas, Other, Converted Auto, Full Automatic, Not Available, Pistol, V4, , P9)
  6. Classification (One of: Non-restricted, Restricted, Prohibited, Unknown, Individual, , Antique, NOT APPLICABLE)
  7. Barrel length in mm
  8. Calibre (Don’t even get me started on this)
  9. Capacity
  10. Date of Registration
  11. Province of Registration
  12. Postal Code Fragment (first 2 of the FSA)
  13. Registration Type (Business, Individual or Museum)
  14. Stolen Date
  15. Lost Date
  16. Recovered Date

A table was created, based on the data observed in those first few rows from the top of the file, and I proceeded to write a quick script to read the file in chunks and write them into the database;  easy enough I thought.

Quickly I came to realize I was up against a data product born of bureaucracy, lack of forethought and oversight.

The import stops;  some of the datatypes selected were not suitable for the data contained further into the file and MySQL is moaning about it.

Take “Capacity” for example, one might expect this value to be a positive integer, affording plenty of space for the number of bullets a firearm could contain.  Wrong.  While generally, the number in the “Capacity” column seems sane, there are 166 distinct values for this ranging from NULL to 861 including a textual reference named “Multi shots”.  Astonished, I revised the “Capacity” to support character strings.  A little trial and error and most of the column sizing issues were sorted out and the import proceeded

It’s a big data-set and my PC is groaning from obsolescence and competing memory requirements; but I am able to freely query the data and run some aggregate functions.  What follows are the first few impressions.

The “Calibre” is simply bewildering; in this file there are 94,798 different calibres listed. A two-worksheet Excel file to make your eyes bleed.

LGR Calibres

For the sake of illustration, lets focus in on plentiful .22LR calibre.  On the second worksheet named “.22 Confusion”, there are 23013 rows that contain “22”;  while some are clearly different calibres (223 REM, .223, .223 Remington etc. etc.) and rightly included as part of the query, the confusion at the Firearms Centre is now clear.  The data captured was largely free-form, with probably nothing more than type and size limits applied.

Little or no effort to normalize the inbound data has been performed;  either at the hand of data entry, the people working the phones, or the venerable “verifier network”.  The data was entered verbatim and, as imported.

Calibre Highlights:

CHAMBERED 308 WINCHESTER
3 BARREL SET
24 ESTIMATED
7MM HIGH POWERED RIFFLE
12 CALIBRE CHOKE
77MM REM MAG

It would be amusing only if it didn’t get much worse.

The column of “Make” is comprised of 163,616 records… the “Black & Decker” must have been removed.  Duplicates abound, in some cases several dozen different spellings for the same manufacturer, some entered in french such as “Smith et Wesson”, and even a broad category of “OLD 22, Unknown”, to “OLD 303, DOES NOT SAY ON IT”.

Take one of the nearly 700 Springfield name variants.

This is only the data they didn’t redact!

These scraps are a mess, but should prove to yield some interesting data.  For example, overlaying the counts of firearms by type by postal code fragment yields an amusing map of where guns are found throughout the country.

Green = Non-Restricted

Blue = Restricted

Red = Prohibited

http://www.gpsvisualizer.com/display/data/20151012210838-23012-map.html

Now forgive the fact the pins are stacked on top of each other in many cases, I’ve just done a quick and dirty map to bring the data to life somewhat and will be refined in time.

I can think of a bunch of questions to pose of the data, but I’m getting the feeling that this whole billion-dollar-boondoggle was just that, the embodiment of garbage in and garbage out.  Other than curiosity, I don’t feel any results, even with dated data would be worth much at all.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s