I am a driven self-starter with who can multi-task while retaining focus in a fast-paced, changing environment. I have 20+ years of experience in both software development and operational positions, often concurrently.
Senior Cloud Engineer
Lead several dedicated consulting teams at various clients. Responsible for technical direction, client ops training, system architecture design, and some implementation efforts. Interfaced with all levels of client hierarchy, from upper management to independent contributors.
Responsible for several internal mentoring and training efforts. Wrote and conducted the Go programming segment of on-boarding training curriculum. Mentored several junior consultants on everything from hard technical skills like programming, systems administration, and networking, to soft skills like communication, task prioritization, and problem resolution.
Wrote SHIELD, a backup solution for BOSH deployments, complete with a CLI for automation and a Web UI for operator use. Allowed several clients to effortlessly perform backups of Cloud Foundry deployments.
Wrote Spruce, a YAML multi-tool that features several operators to enable de-duplication and dynamic generation of parts of the final YAML structure. Spruce was a major factor in several initiatives.
Wrote Genesis, a tool that facilitates a BOSH deployment paradigm based on localization of general manifests to more specific environments, allowing re-use of common structure (jobs, releases, properties, etc.) across multiple installations.
Wrote Jumpbox, a utility for outfitting vanilla Ubuntu VMs with all the software and configuration necessary to run BOSH deployments.
Wrote Safe, an alternative CLI for the Hashicorp's Vault secure storage solution, with an emphasis on making life easier for operators by providing higher-level functionality (move, rename, full-tree listing, etc.).
Wrote and maintained open source Go libraries:
Wrote several BOSH releases:
Contributed to several other BOSH releases:
Contributed to other OSS projects in support of BOSH / Cloud Foundry:
Principal Systems Engineer
Senior member of a Technical Operations team tasked with developing high-quality application and service health check logic, metric data collection / aggregation systems, and data visualization and analysis tools.
Operationally responsible for systems monitoring over 4,200 hosts, 110,000 service checks, and curating ~1.2 million performance data metrics, across 9 data centers.
Received 3 promotions and several merit bonuses, both during normal review cycles and as recognition for excellence in execution on key initiatives.
Led a four-month effort to build a new monitoring system using Open Source components (Project Hammer Throw). Allowed Synacor to discontinue use of commercially-licensed Groundwork Enterprise Monitoring System, saving approximately $160,000/year. Migrated to new system in less than four weeks, with no data loss and no downtime.
Wrote IRIS, a custom Icinga Event Broker module, written in C, enabling the solution to scale to 2x the throughput of Groundwork. Released as Open Source in 2014.
Wrote NLMA, the NLMA Local Monitoring Agent, for scheduling and executing local service check plugins, and feeding the data back to the core monitoring system. Released as Open Source in 2014.
Wrote NLMA::Plugin framework for quickly writing new Nagios Check Plugins quickly, correctly and with less invested effort. Released as Open Source in 2014.
Wrote image analysis software to generate graph images under both systems and find anomalies indicative of data loss or corruption.
Wrote Synformer, a modern web application that coalesces alerts and graphs from all data centers into a cohesive user interface. Synformer facilitates interaction with monitoring, allowing users to acknowledge problems, schedule downtime, clear alerts, search configuration metadata and review alert history. It includes DashCode, a language for writing custom graph and alert dashboards.
Wrote MAD, a custom rules engine that analyzes alerts in the Synformer database, detects patterns of causality and suppresses symptom alerts in the dashboard display.
Wrote ProcLog, a log processing framework used to stream real-time log data from nginx, Apache, Varnish, Jetty and syslog, aggregating request volume, load time, error rate and other web metrics into the monitoring system.
Wrote monitoring check plugins for gathering metrics from Java/JMX installations, Riak, MongoDB, MySQL, Cassandra, Hadoop, PostgreSQL, Apache, Varnish, nginx, Jetty, OpenLDAP, BIND, Postfix, and many other platforms. Consulted during troubleshooting / diagnostics / RCA due to familiarity with these systems.
All software development work done within Git revision control, under collaborative code review and some peer programming. Strong foundation of test coverage (90% target across all codebases) and test-driven development practices.
Operational work, including check configuration, package deployments and configuration file management was carried out exclusively through Puppet, enabling fast stand-up of new and replacement monitoring servers.
Hired on as a Helpdesk Operator in 2006, but quickly demonstrated value and competence in systems design, implementation and administration.
Wrote Ticket Center, a trouble ticket, service request and change communication system to enable the 30+ person IT and Telecom departments to self-manage workload and prioritize requests.
Leveraged CFEngine configuration management system to enable two system administrators to manage 100+ Linux servers, and still have time to devote to other projects as required.
Implemented Nagios monitoring system for 350+ servers, switches, routers and firewalls, to improve operational visibility.
Championed virtualization of infrastructure with Xen, iSCSI and debootstrap VM deployment automation. Pioneered use of blade server technology to increase server rack density and maximize data center floor space usage.
Replaced aging HP-UX infrastructure by writing a custom file transmissions job framework (in Bash) and migrating 300+ nightly and weekly jobs. Implemented secure, managed sftp bastion hosts and CIFS gateways for interacting with Windows sources / destinations.
Designed and built a custom VoIP PBX system using Asterisk for application functionality (Voicemail, Dial-by-name directory, etc.) and OpenSIPs for routing and registration. Served 250+ internal customers across 5 locations. Wrote custom software for provisioning Grandstream GXP phones via tFTP.
Brought in after the departure of senior engineer to finish PHP projects and revamp server configuration / web site deployment. Ported legacy code to newer versions of PHP and implemented version control.
Senior Software Engineer
Created the Exponent CMS. Responsible for all technical decisions related to features and functionality of CMS software. Accompanied sales team as a technical resource to discuss potential new business with clients. Helped manage production web, email and DNS servers running Apache, qmail and BIND.
Fast and flexible monitoring system toolkit, providing a metrics collection layer so that system designers and implementors can get past the mundane aspects of monitoring (calculate load, track CPU usage, etc.) and dive right into the truly interesting stuff (curve-fitting, visualization dashboards, etc.)
System Configuration Management system written in C, with NaCL crypto support and ZeroMQ transport layer, designed with security and speed as primary objectives. Enables administrators to configure multiple hosts through policy definitions, and then enforces those policies on client systems.
A lightweight testing framework aimed at C programmers practicing TDD. Designed to be expressive, through judicious use of preprocessor macros. Works with the popular prove testing tool from Perl.
A small and efficient local monitoring agent, implemented in portable Perl with few non-core dependencies.
Submitted patches against Icinga 1.7.x (a Nagios fork) for improving performance and increasing reliability and throughput of monitoring systems.
IRIS is a custom Event Broker module that increases monitoring capacity, tripling the throughput of an individual host from (unpatched) ~20,000 results per min to ~60,000.
A flexible and powerful static site generator. Powers http://jameshunt.us