Newsletter

Note 31: Meet my robot army.

Beat the Robot Overlords by becoming an Overlord of Robots yourself.

Chris J. Karr

31 Mar 2021 — 12 min read

Last week, over at one of the Substacks that I follow - David Thornton posted an article about automation and the millions (billions?) of jobs that are due to go POOF! as we build robots and intelligent software agents that work better, cheaper, and faster than the sacks of meat and bone that we call humans. (Martin Ford covers this scenario in his book The Lights in the Tunnel, which you should read.) He closed his discussion with a question about how to handle this, and I was happy to oblige with a comment where I argued that instead of thinking about workers as passive victims of automation, a way out may be thinking of workers as individuals with their own agency who should take the initiative to leverage automation to make them better workers. This is a drum I’ve been beating in one form or other for well over a decade, including in my first (and likely last) commencement address to my high school alma mater. (The class of 2014 had no idea what they signed up for inviting me to speak.)

Fortunately, while compiling links to send to Mr. Thornton, I ran across a TED talk that summarizes most of these views effectively. (And with better public speaking skills - sorry DHS Class of ‘14!)

For folks without 12 minutes to watch the video above, Sankar highlights the advantages and successes of the Intelligence Augmentation approach (IA) over traditional Artificial Intelligence (AI). In Sankar’s telling, AI is basically concerned with making devices more capable, where the IA approach is one where humans are the ones enhancing their capabilities (with technology). As humans integrate technology to improve their processes, those improvements lead to increased performance. Increased performance frees up attention and consideration that can be used to identify and exploit additional opportunities to improve the machine in the man-machine symbiosis. Improvement begets increased performance, which begets more improvements, which leads to even more performance increases. Rinse and repeat.

As I pointed out to Mr. Thornton, you can think about this approach as a form of compound interest for personal productivity. Note that this is hardly a new idea, and it has its genesis in the early ‘60s in J.C.R. Licklider’s “Man-Machine Symbiosis” (1960), as well as Douglas Engelbart’s “Augmenting Human Intellect: A Conceptual Framework” (1962). Given that artificial intelligence was effectively founded as its own field of inquiry in 1956 by John McCarthy at the Dartmouth Summer Research Project on Artificial Intelligence, an answer for how humans can meaningfully compete with automation has been with us pretty much all along.

I can write about this topic for as longer than anyone has patience, but rather than dive into cyborg chess, Engelbart’s Bootstrapping Strategy, practical modern personal cybernetics, or capital/labor relations (Marxism is fun in this context), I figured that it may be useful to discuss one instance where I’ve applied this strategy in my own working life in to build and maintain a product that would simply not exist without an augmented me.

The Fresh Comics Power Suit

If you’re looking for an example of personal augmentation in popular culture, you don’t have to go further than Tony Stark:

Without his suit and its supporting systems, Stark’s a precocious trust-fund brat. With the suit and its systems, he can go head to head with the Hulk, who is 100% “natural” ability. Now, I don’t have a power suit (or a trust fund that would let me build one), but I’ve been constructing a metaphorical power suit for dealing with challenges around Fresh Comics for over a decade now.

To recap, Fresh Comics is a collection of mobile apps and a website that attempts to provide an attractive concise user interface to the latest comic releases, local comic shops, conventions, and more. It’s basically a data aggregation and distillation product that ingests information from a variety of sources to help comic fans keep up with the state of that corner of fandom from week to week.

Pretty simple, right? Not so much. The comic book industry has never been one where people go to obtain wealth. Local shops are run on shoestring budgets, most creators are releasing books as labors of love, and its a very organic industry that’s grown up over the decades. Nice well-structured data designed for machines to read isn’t a priority on the industry’s list of concerns. Furthermore, the available data itself is often inconsistent, variably formatted, and full of errors and misspellings. So, how does all of this get pulled into Fresh Comics?

Every data update begins with a simple tab-delimited file just like this.

The first thing that I do is that I make use of Diamond Comic Distributors’ new releases text files and web pages that serve as the initial source of raw data. Since doing simple manual data entry is a non-starter from an available time sand effort standpoint, I created my own unique program called Fresh Comics Builder, which knows how to pull the available data from a variety of public sources and present that information into a focused user interface that allows me to see everything coming in over a particular period of time.

Fresh Comics Builder does a lot of grunt work on my behalf.

This tool is one that I use alone, it’s custom-built for ingesting new data into Fresh Comics. This tool had remained largely unchanged for the past decade, until last spring, when DC Comics decided that they were leaving Diamond for a pair of new distributors. I had to update Fresh Comics Builder to let it know where to look for relocated DC data, since it was no longer part of the existing sources I used. Given that Marvel announced a similar move last week, it’s quite likely that I’ll be doing something very similar in the months ahead for releases from that company.

Raw data is transformed into a format that I can use my distinctly human abilities to quickly review and correct. Note the dedicated “DC” button added last year in the bottom-left.

In addition to the initial collation of release data, Fresh Comics Builder is a portal to that information, where a human is inserted into the data processing loop. Comic data is notoriously dirty, and it helps to have human eyeballs on it to catch any initial misspellings, bad formatting, or to be able to go out to other sources to fetch missing covers or descriptions. After I finish massaging and reviewing the data, it’s exported in a machine readable format that I can then ingest into the Fresh Comics Django web application. However, we’re not done yet.

After human eyes review the data, it’s transformed back into another machine-readable format for ingestion.

There are quite a few obvious issues that jump out to the human eye and are easy to fix (missing covers, inserting paragraphs, etc.). However, there is an additional class of subtle issues that I can’t catch alone, and as part the process, I’ve implemented a couple of additional augmentations to help improve the state of the data.

The first augmentation is running a modified Levenshtein distance algorithm that calculates the overlap of different title and creator names to catch subtle misspellings and/or reformatting. In this week’s import, it caught that I was trying to import a creator named “Jaime Delano”, when there was already a “Jamie” Delano” in the database. It also caught a misspelling of the word “apocalypse” as “apocalpyse” from a publisher who will remain nameless. When a misspelling is corrected, that correction becomes part of the system, so that future occurrences can be processed automatically in future ingestions. Right now, for creators, that is 1,278 potential corrections that will be made automatically.

Making the machine check up on release date changes and cancellations.

After the content is ingested and corrections have been made, the next step of my process is to run a command that takes data from my database for release dates, and goes out to a variety of sources to determine if the release data that I currently have remains accurate. In cases where books have been delayed (a frequent occurrence), the algorithm will update the entries with the new release dates. If the release date has been moved up, the command punts that to me for investigation. Sometimes, the original release date was inaccurate, so an earlier release date makes sense. Sometimes this is evidence that a publisher is re-releasing an item and the earlier date is for the original release date. In these cases, I will remove the duplicate entry from the database. And in some cases, this algorithm identifies when releases have been cancelled altogether, and I remove those from the system.

Putting Levenshtein’s algorithm to work proofreading and enforcing consistency.

Note that while this algorithm is pretty simple, it enables me to keep release dates accurate. If I were to try and attempt to reconcile this information without algorithmic assistance, I would spend more time than I have (on the order of days) to keep this information accurate.

One last round of human review helps verify that new data is tagged properly.

The final step of this process if me manually reviewing the listings for the upcoming books to identify and tag any reprints, trade paperbacks, and variant covers so that information can be used to filter within the apps. This involves me sorting upcoming releases by date, title, and identifier, and quickly scanning through entries and tagging ones that are mislabeled. At the end of this process, I fetch affiliate link information from Things from Another World, run another command to add affiliate marketing links (this is where I actually make money). Finally, I push out a newly-compiled SQLite database to the apps, so that my work is available to the mobile users.

Now, I imagine any sufficiently complex and mature data-centric process looks something like this, but I want to stress that the only portion of this system that was explicitly designed was the original app, and XML-based format it used. The rest of the tools described above were not part of the original design of the system, but emerged organically over the years as I built each to address a distinct pain point that was costing me effort or users.

And I’m not finished developing and evolving this system. In the next few months, I’ll have to find new ways to get Marvel data into the system once they leave their current distributor. Fortunately, I already crossed this bridge with DC Comics, so that disruption will be an order of magnitude less labor-intensive than DC’s exodus. (Last year’s labor already is compounding interest!) I’m also working on improving the issue previews that I include with the app. That has been such a scattershot manual process that it’s ripe for automation and augmentation. Each new feature typically inspires its own new algorithms for effective management.

Using this approach for product development has rendered an exercise that would eat up entire evenings of mine in 2011 into something that takes me roughly an hour and a half once a week. I would have given up on the project years ago if I were operating it without these additional tools, but now - even with its meager affiliate income - it’s a no-brainer to keep maintaining for as little as I have to invest into it. And it has an ancillary benefit in that some technologies that I’ve developed to make my comic-app life easier have found new life in more serious projects, such as adapting the GIS system created to automatically assign the correct time zones to various convention events to help detect fraud from research study scammers looking to score some easy Amazon gift cards.

That said, I also want to close with a word of caution and discuss challenge that come with this style of personal productivity. While it’s true that Iron Man’s Hulkbuster armor allowed him to go toe-to-toe with the Hulk, his same instinct to automate led to Ultron leveling a city in Sokovia. While augmentation and automation has been a net-positive for my own productivity, a couple of caveats to be aware of:

To do this well, you have to be committed to the co-evolution of the system over time. By its very nature, you’re constructing a personal system in a dynamic world, so this is not a “fire once and forget” way of doing things. You will have to have some slack ready for adapting to any unanticipated changes. You have to accept that at no point is the system likely to be finished.
An idiosyncratic personal technology sphere can be bootstrapped relatively quickly, but it can be very difficult to transplant your personal sphere onto someone else. There’s a lot that you can mentally gloss over if you’re the original system creator that a new user plugged into the system will struggle with. While you can “productize” your system, be prepared for quite a bit of remedial work and documentation to make it usable for other folks. However, if you do need to scale up beyond more than one person, your own personal technology will be embedded with experience and concrete reference implementations that will be invaluable when you construct a more team-oriented solution.
Augmentation can make you a more capable worker, but while the ongoing labor you put into the system may allow you to fulfill the roles of several non-augmented humans, you will also incur the additional responsibility of those folks whose involvement you have automated away. This has the effect of concentrating know-how and awareness into fewer and fewer minds (lowering the system’s bus factor), so be sure that you are being compensated for the additional responsibility you are assuming, in addition to the value your augmented labor brings.
In a workplace context, augmentation can work in “looser” environments where workers are empowered to select the best tools to do the job. In more constrained environments (such as manufacturing or health care), the menu of available tools will likely be vetted by higher-ups, so there’s fewer opportunities to deploy an augmentation strategy. In these cases your best bet is to optimize your performance of the process that’s already been provided. Keep this in mind when deciding whether to deploy this strategy. Also note that highly-regimented environments are also those most easily automated (since the processes and tools have already been documented and determined), so if you’re searching for a long-term viable career, avoiding regimented industries might not be a bad bet from a “playing defense against the robots” perspective. That said, if you can be in the position dictating the processes and the tools, odds are pretty good that you may be in a position to be the General of the Robot Army that replaced the humans who previously did that job. (Something that Uber executives tried and failed to do.)

Since my own livelihood is very low on the standardized tools and processes scale, my overall career strategy remains continuing with the human augmentation approach and building my own little robot army. I’m in an industry that changes frequently enough to expose opportunities for productivity improvements through building my own tools (strategically - avoiding NIH syndrome). As a freelance software developer, there’s no one above me to tell me to not to do that, and each bit of labor I can automate away becomes part of my competitive advantage.

Book reports

There Is No Antimemetics Division by qntm (★★★★★): If you’re not familiar with the SCP Foundation, it’s an online collaborative writing project that encompasses horror, humor, and plain weirdness in the form of case files from a shadowy organization that keeps regular folks safe, unaware, and away from the entities described in the files.

This book builds a narrative around a particularly interesting entity, SCP-055:

The story follows a decades-long war between the Foundation’s Antimemetics Division and foreign entities that conquer others through weaponized ignorance. I don’t want to get into the details for fear of spoiling some of the most creative and innovative storytelling I’ve read in some time, but this is definitely worth a read for folks who appreciate things that go bump in the night. (Get a taste here.)

Agents of Dreamland by Caitlín R. Kiernan (★★★★☆): While many folks have done good work crossing the Lovecraftian mythos with modern-day detective and spy novels (Charles Stross for one), this may be the best story that I’ve read that blends straight-up Mythos with a (mostly) modern setting in a compelling and atmospheric way. This is the first of three books, and I’m already working my way through the second one.

In terms of my overall goal, I remain three books ahead of schedule (27 of 100).

Since Substack is telling me I’m near the e-mail length limit, interesting links will return next week, CMDRs.