How to Structure Ongoing Learning in the Software Field

Many software developers have a love/hate relationship with the amount of ongoing learning the profession requires. It can be really enjoyable to learn new things, especially languages and frameworks that are powerful and fun to use. That moment when a big idea crystalizes in the mind is priceless. At the same time it can be fatiguing to watch technologies change, especially the ones you’ve invested so much into.

Given the need to keep up, one thing I’ve concluded is, it is ultimately up to the individual.

software field always changing

Employers might pay for training. New projects may offer learning opportunities. Take advantage of those opportunities if you have them, but make sure to steer your own course. Be very aware if your job locks you into a legacy stack or you have settled into a coding rut. That leads to rusty skills and puts you at risk in the job market.

How I keep up in the software field:

1) Set broad goals that balance new interests, wild tangents, and core learning. For example:

A. This year dive into framework/language X. For me a few years ago that was getting back into Python and Django. Really enjoying it. Next on my list is TypeScript.

B. Try out technology Y in the next few months. In 2014 for me that was buying an Oculus Rift DK2. The goal was to build a virtual reality database explorer. It was a bust. Turns out VR technology makes me seasick. Hey, I just can’t code while nauseated! Recently my ‘new toy’ has been Angular 2, which seems pretty well designed and doesn’t make me gag.

C. Take a deeper dive into technologies you feel proficient in. Currently working my way through Fluent Python, which goes into the more obscure parts of the language, but has lots of great gems.

2) Whenever I encounter a syntax, acronym, or technology that I’m not familiar with, I look it up.

Earlier in my career I was having to look up things all the time! Today it is less frequent, but still common. This tactic has served me well.

3) Keep a journal of my learning and make it a habit.

See notes below on Wiser Learning.

4) Apply the knowledge some way – at work, in my blog, on twitter, etc.

The first three are the ‘input’ side of learning. Point #4 is the ‘output’ side, how you apply and make use of what you are doing, which gives you all important experience.

A tool to help with your learning:

I recently launched a site called Wiser Learning to solidify my own learning path and hopefully help others with their journey. Wiser Learning is in the early stages right now. It is completely free for learners. It is geared towards people trying to break into programming and for those who are already established but still learning (which is almost everyone in the field). I’m having a lot of fun doing it.

Wiser Learning

Right now Wiser Learning is a tool for individuals. The ultimate goal is to get enough developers using it that the best and most relevant learning content will surface on a daily basis. I plan to get a discussion forum going there soon. In any event, I will use Wiser Learning as a systematic way to connect with developers as a mentor.

Recording learning:

You probably don’t realize this, but even if you haven’t picked up a book or watched a video lately, you are constantly learning. Every time you read a blog post or google for how to implement a certain feature you are learning!

Many studies have pointed out the link between journaling and improved learning outcomes. Personally, I’ve found keeping a journal has boosted my motivation to learn. This in itself gives me confidence because I feel like I’m staying on track. Plus when I notice it has been too long since my last entry, I am extra motivated to crack open a book or watch a web video over my lunch break.  Click here to record your learning at Wiser Learning for the first time.

I’d love to hear your feedback about Wiser Learning. I’m also looking for help creating more Learning Graphs, more on those in my next post. Happy learning!

Posted in For New Developers, Work | Tagged , | 1 Comment

Ping your sitemap to Google and Bing with a bash script

If you are using a sitemap.xml file you know you need to submit it to the search engines on a regular basis (say nightly). This is done via a GET request to each search engine’s ‘ping’ URL. Many of the solutions out there for automatically pinging your sitemap.xml file to Google and Bing rely on PHP, Ruby, or other scripting language. On Linux servers, a simple bash script using the built in command wget is sufficient and avoids complexity. As of December 2016, looks like Google and Bing are the only two search engines that support this. Ask.com took their ping script offline, and Yahoo integrated with Bing.  I wound up writing this post because I was annoyed with what I found out there.  Django has a built in one but it only supports Google and never shows output even with verbosity turned up.

The first thing you need to do is URL encode the URL to your site map.

This can be done using an online URL encoding tool.

The URL to the site map for my blog is:
http://www.laurencegellert.com/sitemap.xml, but the search engine ping URL accepts it as a query string parameter, so it needs to be url encoded.

The URL encoded version of my sitemap url is http%3A%2F%2Fwww.laurencegellert.com%2Fsitemap.xml.

Use the URL encoded version of your sitemap url in the script below where indicated.

Place this script somewhere on your system, named ping_search_engines.sh.

#!/bin/bash
echo -------------  Pinging the search engines with the sitemap.xml url, starting at $(date): -------------------

echo Pinging Google
wget -O- http://www.google.com/webmasters/tools/ping?sitemap=YOUR_URL_ENCODED_SITEMAP_URL

echo Pinging Bing...
wget -O- http://www.bing.com/ping?siteMap=YOUR_URL_ENCODED_SITEMAP_URL

echo DONE!

The -O- part tells wget to pipe the output to standard out, instead of a file. That means when you run it manually it displays the output on screen. Wget’s default behavior is to save the returned data to a file. A less verbose mode is -qO-, which hides some of wget’s output, but I prefer to have all that information in the log.

Run chmod +x ping_search_engines.sh so the file is executable.

Add the following cron entry, which will trigger the script every night at 1am:

0 1 * * * /path/to/ping_search_engines.sh >> ~/cron_ping_search_engines.log 2>&1

This script is a good initial way to get going for a simple website. For heavy duty websites that are mission critical, or that your job relies on I’d take it a few steps further:

  • Locate the cron log output in a directory that gets logrotated (so the log file doesn’t get too big). The output is small so even after running it for a year or more the file won’t be that large, but like all log files, it should be setup to auto rotate.
  • Scan for absence of the 200 response code and alert on failure.
Posted in Sys Admin | Tagged , , | Comments Off on Ping your sitemap to Google and Bing with a bash script

Backup to AWS S3 with multi-factor delete protection and encryption

Having secure automated backups means a lot to me. This blog post outlines a way to create encrypted backups and push them into an AWS S3 bucket protected by MFA and versioning, all with one command.

There are four parts to getting the backup big picture right:

Step 1 – secure your data at rest

If you don’t secure your data at rest, all it takes is physical access to get into everything. The first thing I do with a new machine is turn on whatever built in encryption is available. MacOS has FileVault. Ubuntu offers disk encryption plus home folder encryption. With the speed advantages of SSDs I don’t notice a performance penalty with encryption turned on.

Step 2 – backup to an external hard drive

On Mac I use Time Machine with an encrypted external HDD as a local physical backup.

Step 3 – push to the cloud

In case of theft, my backups are encrypted and then pushed to the cloud. There are lots of cloud backup providers out there, but AWS S3 is an economical option if you want to do it yourself. The setup is much easier than it used to be!

As of November 2016, S3 storage costs about $0.03 / GB for standard storage, but you can get it down even more by using Infrequence Access Storage or Glacier storage. Since I’m not backing up that much, maybe 20GB, it is a significant savings over services that charge $10/month. Most importantly I have more control over my data, which is priceless.

Step 4 – verify your backups

Periodically, restore your backups from Step 2 and 3. Without verifying your backups, you have no backups you just think you do.

Overview for getting AWS S3 going:

  1. Enable AWS MFA (multi-factor authentication) on the AWS root account.
  2. Create a bucket that is versioned, private, encrypted, and has a 1 year retention policy.
  3. Setup a sync job user in AWS IAM (identity and access management).
  4. Install AWS CLI tools, authenticate as the user created in step 3, and start syncing data!
  5. At this point you can sync your files unencrypted and rely on S3’s encryption (which requires some extra configuration of the IAM user and s3 sync command), or you can make tar files and encrypt those with something like gpg (linux, Mac) and then push those to the cloud. This article explains the latter.

With this setup, if files were deleted maliciously from S3, the root account can go into the bucket’s version history and restore them. That in turn requires the password PLUS the authenticator device, making it that much more secure.

ALERT / DANGER / DISCLAIMER: I’m not a security expert, I’m an applications developer. If you follow this guide, you are doing so at your own risk! I take zero responsibility for damages, lost data, security breaches, acts of god, and whatever else goes wrong on your end. Furthermore, I disclaim all types of warranties from the information provided in this blog post!

Detailed AWS Bucket Setup with Sample Commands:

1) Enable AWS MFA

AWS root logins can be setup with multi-factor authentication. When enabled, you login with your password plus an authenticator code from a physical device (like your phone). The app is free. In theory this protects from malicious destruction of your data in case your root password is compromised. The downside is, if you loose the authenticator device, you have to email AWS to prove who you are to get it re-issued. ENABLE MFA AT YOUR OWN DISCRETION. Make sure to read the docs and understand how MFA works.

To enable MFA, login to your root AWS account, click on the account menu in the upper right and go to ‘Security Credentials’. You’ll need to install an authenticator app on your smart phone and scan a QR code. The instructions will walk you through it.

AWS MFA

 

2) Create a new S3 bucket

Navigate to the ‘S3’ module, and create a new bucket.

  1. Pick a region that is on a different continent for maximum disaster recovery potential. I went with Ireland for price and geographic reasons, there are many options to choose from.
  2. Under Properties, Versioning, enable versioning on the bucket.AWS S3 Bucket Enable Versioning
  3. If you want to automatically clean up old deleted files, you might setup a rule under Lifecycle, Add Rule, Whole Bucket, Action on previous version:
    1. Check Permanently Delete – 365 days after becoming a previous version
    2. Check “remove expired object delete marker”

    AWS S3 Bucket Lifecycle

3) Create a sync job user.

A new user with a very narrow permission set will be used to backup your data into the bucket you just created. The sync user is only able to read/write to the S3 bucket and nothing else. Importantly, the sync user is not allowed to delete buckets!

Under AWS’s Identity and Access Management (IAM), add a new user ‘sync-user’. This user does have delete access, but the bucket is versioned so the data is still there just flagged as deleted.  Make sure to save the access key and secret key it generates somewhere safe like your KeePass file.

Give the new user the following custom inline policy. Click on the newly created user, go to the Permissions tab, expand Inline Policies, click Create User Policy, select Custom Policy. Name it something like ‘backup-policy’.

aws iam user

AWS IAM add inline policy

AWS IAM custom policy

AWS IAM inline policy document

For the Policy Document, copy the following verbatim, except the bucket name. Replace BUCKET_NAME_HERE, which appears in two places, with the name of your bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET_NAME_HERE"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET_NAME_HERE/*"
            ]
        }
    ]
}

4) Get AWS CLI setup and start syncing!

Amazon provides a set of command line tools called AWS CLI. One of the commands is s3 sync which syncs files to an S3 bucket, including sub directories. No more writing your own script or using a 3rd party library (finally).

  1. To install AWS CLI on Mac:
    sudo pip install awscli --ignore-installed six

    See the official AWS CLI install instructions page for more details.

  2. Configure AWS CLI:In a terminal run the following:
    $ aws configure

    Paste in the keys from the user created in step 3 above.

To sync a folder to the bucket, run the following command in the terminal. Replace BUCKET_NAME_HERE with your bucket’s actual name.

$ aws s3 sync ~/folder s3://BUCKET_NAME_HERE/folder --delete

Read more on the aws s3 sync command here.

5) A sample script that tars everything, encrypts, then copies to S3:

To get this working, you’ll need to install gpg and setup a public/private key with a very strong password. Make sure to backup that key. I have an article on GPG here.

#!/bin/bash
date=`date +%Y-%m-%d`
echo -------------  Backup starting at $(date): -------------------


echo "Copying files from my home folder, Desktop, etc into ~/Documents/ so they are part of the backup "
cp -r ~/.ssh ~/Documents/zzz_backup/home/ssh
cp -r ~/Desktop/ ~/Documents/zzz_backup/Desktop
cp ~/.bash_profile ~/Documents/zzz_backup/home/bash_profile
cp ~/.bash_profile ~/Documents/zzz_backup/home/bash_profile
cp ~/.bash_prompt ~/Documents/zzz_backup/home/bash_prompt
cp ~/.path ~/Documents/zzz_backup/home/path
cp ~/.gitconfig ~/Documents/zzz_backup/home/gitconfig
cp ~/.my.cnf ~/Documents/zzz_backup/home/my.cnf
cp ~/backups/make_backup.sh ~/Documents/zzz_backup/home/

echo "Clearing the way in case the backup already ran today"
rm ~/backups/*$date*


echo "Making archives by folder"
cd ~/

echo "    Documents..."
tar -zcvf ~/backups/$date-documents.tar.gz ~/Documents
# .. you may want to backup other files here, like your email, project files, etc


echo "GPGing the tar.gz files"
cd ~/backups
gpg --recipient {put your key name here} --encrypt $date-documents.tar.gz
# ... again, add more files here as needed

# NOTE: to decrypt run gpg --output filename.tar.gz --decrypt encryptedfile.tar.gz.gpg

echo "Removing the tar.gz files"
rm $date*.tar.gz

echo "Syncing to S3"
/usr/local/bin/aws s3 sync ~/backups s3://my-backups/backups --delete

echo Done!
Posted in Sys Admin | Tagged , , , | Comments Off on Backup to AWS S3 with multi-factor delete protection and encryption

Django Group By Having Query Example

Example Group By Having query in Django 1.10.

Let’s say you have SQL such as:

SELECT DISTINCT user_id, COUNT(*)
FROM my_model 
WHERE tag_id = 15 OR tag_id = 17
GROUP BY user_id 
HAVING COUNT(*) > 1

The above query finds all the user_ids in the my_model table with both tag id 15 and 17.

In Django a Group By Having query is doable, but you have to follow the Django way:

MyModel.objects.filter(row_id__in=[15,17])\
.distinct()\
.values('user_id')\
.annotate(user_count=Count('user_id'))\
.filter(user_count__gt=1)\
.order_by('user_id')

Breaking it down:

MyModel.objects.filter(row_id__in=[15,17])

This should look familiar, __in is a nice shortcut for an OR clause, but that isn’t material to the GROUP BY / HAVING part at all.

.distinct()

Adds the DISTINCT clause. It may not be relevant in your case. I’m doing it to be clear that I don’t want to see the user_id more than once.

.values('user_id')

This is where you specify the column of interest.

.annotate(user_count=Count('user_id'))

The annotate call sets up a named key and specifies the aggregate SQL function (COUNT, SUM, AVERAGE, etc). The key user_count is arbitrary, you can name it whatever you want. It gets used in the next line and is the row dictionary key for the count value (think SELECT COUNT(*) as user_count).

.filter(user_count__gt=1)

This is what makes the HAVING clause appear in the SQL that gets executed. Note that user_count__gt is matching the named key created on the previous line and filtering it for values greater than 1.

.order_by('user_id')

The example SQL above doesn’t have an ORDER BY, but I’ve added it here to point out a quirk in Django. If order_by is left off Django will add the model’s default sort column to the SELECT and GROUP BY sections of the query. This will screw up your results. So, when doing a Group By / Having query in Django, it is a good idea to explicitly specify a sort order even if it doesn’t matter in a logical sense.

It will return something like:

<QuerySet [{'user_count': 2, 'user_id': 1L}, {'user_count': 2, 'user_id': 5L}]>

Trying to get your query working in the shell? This might help: how to turn on SQL logging in the Django shell.

Posted in Code | Tagged , | Comments Off on Django Group By Having Query Example

Django enable SQL debug logging in shell how to

How to get Django to show debug SQL logging output in the shell. Works in Django 1.10!

Start the Django shell:

python manage.py shell

Paste this into your shell:

import logging
log = logging.getLogger('django.db.backends')
log.setLevel(logging.DEBUG)
log.addHandler(logging.StreamHandler())

The last line log.addHandler(logging.StreamHandler()) may not be needed if you already have a StreamHandler in your logging config. Worst case, you’ll see the SQL twice.

Next time you run a QuerySet it will show the SQL and the regular result:

>>> MyModel.objects.filter(row_id__in=[15,17])\
.distinct().values('user_id').annotate(user_count=Count('user_id'))\
.filter(user_count__gt=1).order_by('user_id')

DEBUG (0.001) SELECT DISTINCT `my_model`.`user_id`, COUNT(`my_model`.`user_id`) AS `user_count` FROM `my_model` WHERE `my_model`.`tag_id` IN (17, 15) GROUP BY `my_model`.`user_id` HAVING COUNT(`my_model`.`user_id`) > 1 ORDER BY `my_model`.`user_id` ASC LIMIT 21; args=(17, 15, 1)

<QuerySet [{'user_count': 2, 'user_id': 1L}, {'user_count': 2, 'user_id': 5L}]>
Posted in Code | Tagged | Comments Off on Django enable SQL debug logging in shell how to

Quick guide to freeing up disk space in Ubuntu 16.04

Ubuntu retains apt package downloads. It also default installs all locales (language packs). These can be freed up to reclaim as much disk space as possible. This should come in handy on VPS servers, especially the smaller ones backed by SSDs. For example the smallest Linode server currently available is a 24GB SSD so every free MB counts. On larger severs, it probably isn’t worth it though. Was able to reclaim about 200MB in my case.

In summary I ran the following:

sudo apt-get install localepurge
sudo apt-get clean
sudo apt-get autoremove
sudo reboot

After initial setup of a new Ubuntu app server it was using 1.86 GB. Then after running the above commands it was using 1.66 GB, so about 200MB was freed up. May not be worth your while and your mileage may vary (YMMV)!

Original Disk Usage:

laurence@appserver:~$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root       24505644 1960172  21283904   9% /
...

Resulting Disk Usage:

laurence@appserver:~$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root       24505644 1743556  21500520   8% /
...

About localepurge:
Localepruge is a tool that gets rid of all the language packs you don’t need (because you can’t read 400+ languages).  Run it at your own risk!

It defaults to keeping en-US packages, but loads a dump terminal style menu that lets you pick the languages / locales you want to keep. Have fun wading through the hundreds of options in the menu if you want to change the settings. Use the tab key to get to the <ok> button.

localepurge

It requires a reboot after being installed to finish the cleanup process.

Before using it, please read more about localepurge here: https://help.ubuntu.com/community/LocaleConf

About apt-get clean:

Ubuntu saves packages downloaded through apt before they are installed. This can start to add up.

To see how much space the cached packages are using run:

sudo du -sh /var/cache/apt/archives

Mine was only using 32MB at the time, but the command freed it up completely.

To delete the cached packages:

sudo apt-get clean

You’ll probably want to run that command periodically, or each time after running `apt-get upgrade`.

About apt-get autoremove:

This command cleans up packages that were installed as dependencies to packages that are no longer installed.

Eg, if package A required package B and C, but you uninstalled package A, package B and C would still be around. Calling autoremove will ‘take care’ of package B and C.

On a freshly installed server, you probably won’t free up much space with this command but if your server has whiskers it may help.

Posted in Sys Admin | Tagged , | Comments Off on Quick guide to freeing up disk space in Ubuntu 16.04

Webservice API design tips – correct pagination and exposing deleted rows

After working with dozens of REST, SOAP and ‘ad-hoc’ web services / APIs I’ve noticed a similar set of design problems by companies big and small. One gotcha I almost always see left out of an API is an easy way to determine which records were deleted or moved on the backend. Another gotcha is implementing pagination and sorting in a helpful way. This includes ‘feed’ style listing APIs where the data under the API is changing constantly. I’ll explain the solutions below after a brief introduction.

Overview of a ‘books’ listing endpoint:

Let’s say we have a web app, a native iOS app and a 3rd party system that need to look-up books in a database. A RESTful API is perfect for this!

Let’s make the API a decent one by allow keyword filtering, pagination, and sorting.

# listing of book records, default sort, page 1 implied, default page size of 10
GET /books
{
record_count: 24178,
page: 1,
results: [
	{title: "Calculus 1st Edition", publisher: "Mathpubs", id: "15878"},
	{title: "Geometry 4th Edition", publisher: "Heath", id: "65787"}
	....
]
}
# listing of book records that contain 'python' as a search match
GET /books?q=python
{
record_count: 147,
page: 1,
results: [
	{title: "Python", publisher: "O'Reilly", id: "74415"},
	{title: "Fluent Python", publisher: "O'Reilly", id: "99865"}
	....
]
}
# listing of book records, sorted by title
GET /books?sort=title
{
record_count: 24178,
page: 1,
results: [
	{title: "Aardvark's Adventures", publisher: "Kids books", id: "124789"},
	{title: "Aardvark's Explained", publisher: "Zoolabs", id: "988741"}
	....
]
}
# get the 10 most recently updated books related to python
# note the minus (-) sign in front of updated_at, that is a Django convention but in your API do it however you want, perhaps better to specify it as "NewestFirst", just keep it consistent
GET /books?q=python&sort=-updated_at&page_size=10
# next get the 11 - 20 most recently updated books related to python
GET /books?q=python&sort=-updated_at&page_size=10&page=2

My notes on sorting a webservice listing endpoint:

  • By default, sort the results by something natural like title or date created if the sort parameter isn’t supplied.
  • Allow the client to specify a sort order. Validate the sort order they provided against a list options the server allows. Return a warning if it is invalid with a 400 error (bad request).
  • An essential sort order option is the time a record was last updated, newest first (typically updated_at desc). With that sort option a client can crawl through the pages until it hits a date already processed and stop there. So many APIs I’ve worked with overlook sorting by update_at desc. Without the updated_at desc sort option a client is forced to crawl the entire listing to find anything new or updated. This is very inefficient for large databases with a relatively small number of regular changes or additions.

My notes on paginating a webservice listing endpoint:

If your data set has more than say, 10 rows, adding pagination is a good idea. For very large data sets it is essential because too much data in a request can crash the server or the client.

  • Implementing pagination is a matter of the proper LIMIT / OFFSET queries on the backend, though that varies by ORM and data store.
  • One annoying thing that may dissuade you is, the server should return the total count of records that match in addition to returning the slice of rows that match the current page and page size. This is so the appropriate page links {1,2,3,4…} can be generated. Getting the overall count of matches can be a performance hit because it involves an extra query. If you want solid pagination, you just have to bite the bullet in terms of the count query.
  • The client should be able to tell the backend the page size it wants, but it should be validated (say between 1 and 100 most of the time).
  • Really good REST frameworks like Django-Rest-Framework offer ‘next-page’ and ‘previous-page’ URLs inside the JSON response – very handy for paging!

My notes on paginating a ‘feed’ style listing:

Some data sets are a lot more wild than books and change constantly. Let’s take the example of a twitter style feed, where bots, celebrities, teenagers, and software developers waiting on unit tests are tweeting their heads off in real time.

In this case, the database needs to organize records by a natural sort. Twitter has the concept of an ‘id’ that is sortable. Yours might be the updated_at flag or some naturally sorting hash that goes on each record (maybe the primary key). When the client loads the feed, the first call asks for a page of data with a given number of rows (say 50). The client notes the maximum ID and the minimum ID it got (typically on the first and last rows respectively). For the next API call, the minimum ID gets passed back to the server. The server then returns the next 50 rows after the minimum ID value the client saw. The server could also return the number of ‘new rows’ on a periodic basis with an ID higher than the maximum ID the client initially got. It has to be done this way because while the user was reading their tweets and scrolling down, it is possible many new tweets were created. That would cause everything to slide down and screw up traditional pagination.

Twitter has a more in depth tutorial here:
https://dev.twitter.com/rest/public/timelines

What about deleted or moved records??

Getting at deleted records in an API is a practical problem I’ve had to solve several times. Think of case where a background process scrapes an API and keeps tabs on what changes. For example, social media posts or content records in a CMS.

Let’s say an hour ago, the listing API was scanned and all data was retrieved and our copy is in perfect sync with the other side. Now imagine the book with ID 789 gets deleted on the server. How do we know that 789 got deleted?

Invariably, I have ask the people who made the API and they write back and say something like, “it can’t do that, you have to page through the entire set of data or call for that individual book by ID”. What they are saying is, on a regular basis do a full scan of the listing, compare that to what you have, and anything you have that the server doesn’t was deleted on the server.

This situation is particularly painful with very large data sets. It can make nightly syncs unfeasible because there is just too much data to verify (rate limits are quickly exceeded or the sheer amount of processing time is too high). Let’s say you are forced down that road anyway. You have to be very careful when triggering deletes on your side since a glitch in the API could cause accidentally deletes on your side. In this scenario when the API goes down or responds with an empty result set the scraping program might think “great I’ll delete everything on this side just like you asked since it looks like nothing exists anymore!”. To prevent that kind of disaster, in the past I’ve limited the maximum number of deletes per run and alerted when it found an excessive number of deletes.

Fundamentally a RESTful API isn’t a great way to mirror data that changes all the time. The reality is, often it is all you have to work with, especially given mobile apps and cross platform connectivity, security requirements, etc.

Here is what I do regarding server side deletion of records in a listing API:

First of all, as a general principle, I almost never design a database to allow immediate physical deletion of records. That is like driving without a seat belt. Instead, I add a deleted column with type tinyint/bool/bit default 0 to every single table. The front end and all APIs are programmed to filter out deleted rows. This way, if something is accidentally deleted, it can easily be restored. If a row has been deleted for more than a given period of time, say 12 months, a cleanup script will pick it up and physically trash it and associated child rows out of the database. Remember – disk space is cheap but data loss is costly.

Another way to do this is to keep a DeletedBooks table. Whenever a Book is deleted, make an entry in that table via a trigger or hook or whatever your framework fires off after a record is deleted. I don’t like that as much as the deleted bit column solution because with hooks / triggers things get complicated and data loss can happen unless they are truly ‘transactional’. However, a DeletedBooks table may be easier to put in place in a legacy system that constantly stymies your efforts to make a good API.

Now that our data layer has knowledge of deleted records, we can add a new endpoint for deletes that only returns books that were deleted. This API should be paginated, allow filtering, etc. Note that it includes a date_deleted field in the results, which may be useful to the client. In most cases date_deleted may be substituted for updated_at.

# listing of deleted book records!
GET /books_deleted
{
record_count: 50,
page: 1,
results: [
	{title: "Algebra II", id: "29898" date_deleted: "2016-08-20 T18:25:43.511Z" },
	{title: "Trig for Kids", id: "59788" date_deleted: "2016-08-17 T07:54:44.789Z" },
	....
]
}

You could also add a deleted parameter to the original listing API to filter for deleted records:

GET /books?deleted=1

A similar implementation can be created for records that disappear for whatever reason – moved to a different account, re-classified, merged, or tossed around like rag dolls. The basic idea is to expose data so clients can decipher what the heck happened instead of having to page through the entire listing API to piece it together.

All the other ‘best practices’ for REST APIs:

If you’ve read this far you are probably committed to building a good API. Thank you. It is a thankless job like many in ‘backend’ software, but let me again say Thank You. Unfortunately, people usually don’t notice when things go smooth, but a bad API is very easy to notice. Perhaps a few developers have suffered permanent IQ degradation from being forced to write code against poorly designed, undocumented, and jenky APIs. Together, we can ensure this is a thing of the past.

All the docs I’ve read say a good API should emit JSON and XML. Your framework should handle that for you, so I won’t say anything more about that.

Eg:

GET /books.json -> spits out JSON
GET /books.xml -> spits out XML

Successful requests should also return the http status code of 200.

Here are some other status codes you’ll want to use in your API.

  • 400 – bad request (inputs invalid, something screwed up on their end)
  • 401 – unauthorized (user is not authenticated or can’t access this particular thing)
  • 404 – not found (just like a page not found error on the web)
  • 405 – method not allowed (eg, client tired to POST to an endpoint that only allows GET requests)
  • 500 – internal server error (something screwed up on your end, I hope you logged the details?)

For a complete list of HTTP status codes see:
http://www.restapitutorial.com/httpstatuscodes.html

Other good tips I’ve seen include: Versioning your API, use verbs correctly (GET, POST, DELETE, PUT, …), use SSL, document it, etc.

For more best practices involved RESTful APIs see:
http://www.vinaysahni.com/best-practices-for-a-pragmatic-restful-api
http://blog.mwaysolutions.com/2014/06/05/10-best-practices-for-better-restful-api/

Posted in Application Development, Code | Tagged , , , | Comments Off on Webservice API design tips – correct pagination and exposing deleted rows

How to Deftly Handle Pushy People and Succeed on Software Projects

Working in the software profession you will often run into a situation where someone is pushing your estimates down, asking for the moon, or increasing scope without adding resources or time. There is no instruction manual for handling that sort of thing. The only thing you can really control in that situation is how you respond. I’ve learned a few skills over the years that help.

The fundamental law governing all projects – the Iron Triangle:

iron_triangle

Probably the most widely accepted concept from project management is the ‘iron triangle’. The corners are:

  • Scope – what does the project consist of? Includes QUALITY of the work!
  • Schedule – when will the project be released?
  • Cost/resources – how many dollars and or hours are available?

The law of the iron triangle: when one corner is reduced at least one other corner must expand to compensate.  Every conceivable way to cheat this law has been tried.

Another way of looking at the iron triangle is: better, faster, cheaper – pick two and you’ll be fine.

Feeling the squeeze and reacting with empathy:

Normally balancing the triangle is in direct conflict with a project being seen as profitable or successful by at least one if not multiple stakeholders on a project. Invariably someone is going to want to push on one of the corners. You’ll be asked if you can do it sooner, or if you can do it with one less team member, or if you can launch it with just one extra feature. The important thing is not to take it personally. It may be their job to do so, perhaps a performance evaluation or paycheck is on the line? In the famous Getting to Yes book, one of the things that has always stuck with me is separating the people from the problem.

The naive and closed off way of think of those who push you around might be:

  • The CEO with the personality disorder
  • The sales person who lied about the system’s functionality
  • The project manager gunning for a promotion
  • The team member who insists on using language X
  • The team member who insists on staying with older technology

Instead of using labels, the wiser path is to see them as people who have shared goals with you, who want what they think is most important.

  • The CEO who is self assured and wants to ‘win’ (which is good for you)
  • The sales person who is always upbeat and optimistic (without sales, the software is irrelevant)
  • The project manager who bases their self worth on their accomplishments
  • The brilliant and eager developer who wants to use language X
  • The experienced and cautious developer who trusts what they know

Negotiation skills for getting out of the squeeze:

For most senior software professionals, it is second nature to refer back to estimates, bring up concerns of quality, or tactfully point out how rushing in the short term leads to bad outcomes later on. If all you offer is ‘pick two’, or ‘it is impossible’, you are right, BUT whoever you are talking to is coming to you for a solution not a dismissal. Here are some techniques that have helped me deftly get out of pressure situations while making the customer happy:

a) Soft launch / beta release: Release the software in a partially complete state to a very small number of trusted users. Finish up everything else as more users come on board. This allows the schedule to flex some, and possibly even the resources, but keeps the scope in tact.

b) Start a road map: Setup a long term plan (1-2 years) which covers all the necessary features. Illustrate any changes to resources, scope or schedule on the road map so the trade offs are apparent. Some advantages to having a road map are that everyone in the company can setup the 1.0 features in a way that leaves room for the 2.0 features down the line. Of course, leave room to clean up bugs and technical debt along the way and make it clear that when these get left out they will come back to steal resources later on.

c) Primary deliverables and secondary deliverables: Primary deliverables are must have items. Secondary deliverables are things that are needed, but don’t take immediate priority. Usually items like report pages, admin screens, data exports, and print friendly pages make great secondary deliverables. Coming to an understanding over what items are the most important can be a huge breakthrough in communication.

d) Make room for technical items: Every release, include at least one or two technical cleanup items. Politely insist on these items at every planning meeting. Explain the consequences of not following through. An example – the SSL certificate on the API is going to expire in 6 weeks. Unless that is updated all users will get locked out of the application.

e) Be honest about your limitations: It can be hard to admit you need some help or that a specific part of the project isn’t suited to your skill set. For rock star developers it is tempting to take on everything all at once. I always tell people – I can’t pick colors or draw… for the sake of the product let’s get a designer on board sooner than later so I can implement what they come up with and we can stay on schedule.

Another tool – Non Violent Communication:

This post was inspired by the book Nonviolent Communication (NVC) by Marshall Rosenberg. NVC explains a formula for effective communication.

As a software developer I liked the way it was presented as a ‘recipe for communication’ with lists of wording choices.

The basic formula is:

  1. State observations that are objective in nature and seek clarification.
  2. State how those observations are making you feel using specific language.
  3. State your needs.
  4. Make specific requests so your needs can be met.

Here is a list of descriptive ‘feeling’ words: https://www.cnvc.org/sites/default/files/feelings_inventory_0.pdf

I don’t know if statements like “i’m feeling enchanted“, or “i’m feeling wretched” are 100% work appropriate, but the idea is to be very specific about how you feel so the other side opens up.

NVC Applied during an ambush:

One day early in my career I recall being cornered in front of a white board by multiple senior managers. They insisted I launch a product by a certain date with a certain set of features. I told them the original estimate of four months was what it would take to get it done. They kept asking me questions like “what can we cut?”, “how can we do this cheaper?”, “is your estimate for that already padded?”.

We looked at every aspect of the project together. It went on for an entire afternoon. Every design decision was scrutinized. Fundamental items were scraped. I walked out of there feeling drained and wondered what kind of people I worked for. In hindsight they were struggling to deliver on a bad promise they had made, all with the best intentions. The project ended up working out fine. I didn’t have to work evenings and weekends to get it delivered. Later I went on to clean up some of technical debt in subsequent releases. Then I took another job (where I stayed for a long time) and washed my hands of the whole code base.

On that fateful day, had I known about the tools from Nonviolent Communication, I could have done the following:

1) Make observations:

Wow, hang on a minute here, let me catch my breath! I’ve noticed everyone is pushing hard to make this project as cheap and fast as possible.

1b) Seek clarification:

What changed? Did we loose funding or did a customer back out?

Are you sure you want to do without feature X, my concern is that is an essential element of the project.

I’d like to know what Joe from down the hall thinks about changing feature Y.

Maybe we should we call in project manager Jane to provide some input because my expertise is in software not project management?

2) State my feelings in a clear manner:

I’m feeling lost because our commitment to quality and customer satisfaction can’t be upheld when we rush through our work. I’m also feeling flustered because non-technical people are overriding the technical details of a product I am responsible for.

3) State my needs:

My needs are to deliver work that is complete, done to the best of my abilities, and aligned with what my manager and the business expects of me.

4) State my requests:

Would the leadership team be willing to phase in everything in my original spec over a 4 month period, with a soft launch in 3 months? Would the leadership team be willing to allow key technical items that ensure quality as part of each release over the 4 month period?

Further reading:

Posted in Business, Work | Tagged , , | Comments Off on How to Deftly Handle Pushy People and Succeed on Software Projects

Django Automatic Slug Generator Abstract Class

Using Django, an example of how to auto populate a slug field based on the name or title of whatever the thing is. Correctly handles duplicate values (slugs are unique), and truncates slug if value too long. Includes sample unit tests!

The built in Django model.SlugField() is basically the same thing as a model.CharField(). It leaves the work of populating the slug up to you.  Do not override the save() method of your concrete model and copy and paste that code to every model in your project that needs the behavior (that causes dementia).

Django makes it possible to elegantly abstract the ‘fill in my slug’ behavior into an abstract base model. Once you setup the model the slug generation is entirely automatic. Just leave the slug field blank in your app code, call .save(), and the slug will get populated.

This is compatible with Django 1.8 and 1.9 for me on Python 2.7, and probably earlier versions Django too.

What is a slug?

A slug is a unique, URL friendly label for something, usually based on the name of whatever that thing is. For example, the book Fluent Python’s slug might be ‘fluent-python’.  This blog, powered by WordPress, makes extensive use of slugs, every post gets one as you can see in the URL bar.

What about corner cases?

If the title is too long for the slug field, the value will be truncated. In the case of a duplicate name (which may be okay depending on the model), the slug will get suffixed with a number, eg ‘fluent-python-2’.  There are some unit tests below so you can carry these into your project too and be confident.

My field names are not ‘name’ and ‘slug’!

That is okay. It is setup so you can customize the source field name and the slug field name on a per model basis. See the LegacyArticle example in the Gist.

Example usage in a TestCase:

row = Article()
row.name = 'The Computer Is Talking Sense'
row.save()
self.assertEqual(row.slug, 'the-computer-is-talking-sense')

# create another one with the same name, should get -2 added
row2 = Article()
row2.name = 'The Computer Is Talking Sense'
row2.save()
self.assertEqual(row2.slug, 'the-computer-is-talking-sense-2')

# change the name, the slug should change too
row2.name = 'Change is Good'
row2.save()
self.assertEqual(row2.slug, 'change-is-good')

# slug gets truncated
row = Article()
row.name = '0123456789' * 25
row.save()
self.assertEqual(row.slug, ('0123456789' * 10))

# slug gets truncated, accounts for suffix
row = Article()
row.name = '0123456789' * 25
row.save()
self.assertEqual(row.slug, ('0123456789' * 9) + '01234567-2')

# loop to trigger integrity error
for i in range(1, 10):
 row = Article()
 row.name = 'loop'
 row.save()

row = Article()
row.name = 'loop'
# hijack the local attribute just this once
setattr(row, 'slug_max_iterations', 10)

try:
 row.save()
 self.fail('Integrity exception should have been fired')
except IntegrityError as e:
 self.assertEqual(e.message, 'Unable to locate unique slug')
Posted in Code | Tagged , | Comments Off on Django Automatic Slug Generator Abstract Class

My Answer To: I want to learn programming, should I attend a code school?

I recently had a reader ask me if they should attend a coding academy because they want to get into programming. Here is my answer:

There are many success stories involving code schools. In fact my grandmother was one of those success stories, but more about her later. Still, I’d be careful of code boot camps, code schools, hack schools, code academies and the like. Read the reviews and talk to grads who are a year or so ahead of you.  One code school in my home town recently shut down suddenly with no warning. Anecdotally, I’ve heard of code schools hiring their own grads as tutors or instructors just so they can claim a high percentage of their grads are employed in the field. At best that is called academic inbreeding, at worst it is a pyramid scheme. Aside from avoiding getting ripped off, you want to make sure of the quality of the curriculum. These schools often jump directly to frameworks and web development without providing proper fundamentals like what is a function, what is a variable, etc.

So yes, this equation is sometimes true:

$12k Code camp + $2k Mac Book Pro = Job in software @ $60-110k/yr

but this makes more sense:

The Knack + Enjoyment = A career you love that pays well

Software takes a special knack and to be good you must love it:

To succeed in software you need many things, but at the top of my list is an innate knack and natural enjoyment for it. I’d rule those out first before dropping money on tuition.

My first criteria is you must have the gift for coding, and that is probably genetic. What goes into the innate ability to code has been studied and blogged about. The bottom line is, there is no way to teach everyone to write code. It doesn’t work the in same way that almost all of us absorb language as infants or grow up and join Facebook if we want.

My second criteria is you must enjoy writing code. Know anybody who can concentrate on abstract details for long periods of time? A lot of people I know can’t believe I sit in a room for 10-12 hours a day doing what I do. They say it would drive them totally nuts. Even very gifted and intelligent people struggle and end up hating it. I once had a calculus teacher who despised writing code. He couldn’t get his program to run, even though he said it was mathematically perfect… It was probably just a syntax error.

How many people have ‘The Knack’ as a percentage of the population?

The short answer is somewhere between 1.5% and 3%, but the numbers are pretty fuzzy.

theknack

Based on the bureau of labor statistics, in 2014 in the US there were 2.3M jobs directly related to writing code. According to this infographic of the 17M people in the ‘nerd economy’ world wide, 44% were in the US. That is ~7.5M but may include some non-programmers. Let’s take a guess and say in the US it is 5M. Out of population of roughly 320M in the US, that comes out to ~1.5% of people who write code for a living.

If you walk past the average family on the street, you would see 1.8 children. If you walk past 100 people on the street, 1.5 of them would be employed in software development. Except in the bay area it would probably be closer to 50! The 1.5% estimate only reflects those who are active, not those who have the knack + enjoyment but do something else, nor those who were downsized. As a planet we can get that number higher if age bias goes away and more opportunity is provided for minorities, women, and people of low socioeconomic status.

What goes into ‘The Knack’ for writing code?

The following traits appear to be closely associated with coders:

  • Analytical skills
  • Problem solving
  • Rapid comprehension (fast learner)
  • Mathematical aptitude
  • Musical proficiency
  • More interest in truth than appearances
  • Good memory
  • Creativity
  • Do-It-Yourself (DIY) hobbies
  • Obsession with science fiction
  • T-shirt collections
  • Difficulty picking good looking color combinations

 

Software is a journey, it is cyclical, and the learning never stops:

The idea that anybody with $12k can become a great programmer in a matter of 5 months is so wrong. I’ve been programming for almost 20 years and I’m still improving. Who you work with and what you are working on matters. It may take a decade for everything to really start clicking.

We live in a golden age of technology expansion. Right now the world is experiencing another technology bubble. This one may not be as big or as violent as the dot com boom, but programmer demand is out of control. Overall I think demand for software will continue to grow for many years while being bridled by bust and boom business cycles. That is until self aware artificial intelligence gets loose and kills us all (software developers first no doubt).

I recall during the dot com boom my wages were artificially boosted which I thought was permanent at the time. I also found myself working around a bunch of yahoo’s who had no business in software. They were ultimately weeded out of the field. That pattern is peaking yet again.

A CS degree, or some kind of complimentary degree from an accredited university should be on your road map. To test the waters you might start with a free online course.

In software it is entirely possible to start off being self taught – like I was. My first paying gig was at age 16. I was literally the kid down the street. At the time I was very rough around the edges. Side projects and eventual part time employment allowed me to pay my own way through college. It was hard, I clocked 20-30 hours per week and took 8-16 credits per term including summers. I got into the habit of running through flash cards every night before I went to bed. Side note – it turns out memories are best formed right before going to sleep, so studying before going to bed helps with retention. What I learned in college wasn’t as immediately valuable as my software skills, but it ended up being the prefect compliment to my life. I learned how to write, how to analyze information, and grow up some. I also met my wife in college.

Your assignment:

To find out if software is the life for you, my advice is to get a cheap PC laptop, install linux on it (Ubuntu or RedHat), and start with (Python, Ruby or PHP), Javascript, and SQL. Online outlet stores like the Dell outlet  and the Lenovo outlet have good deals on refurbished hardware (which is basically an extended burn in test).

Start going to local meetups and hack nights. Get in the habit of learning all the time. Whenever you see a word or acronym you don’t know, google it and make a flashcard for it. Flip through video presentations from past software conferences like OSCON, InfoQ, etc, much of the content is made available for free!

Check out some books on programming from the library. The web is great for bits and pieces, but a published book typically has more depth.  The first chapter of most programming books will be about setting up your computer and installing the right programs (called your environment). Then you will write a program that prints ‘hello world’ on the screen. Note how you feel and how smoothly it went. If you are totally flummoxed this, you may need to some face to face help, which brings me to the next section.

Get a mentor:

There are many people out there willing to share their knowledge. Some will charge anywhere from $10-$100/hr, others ask nothing in return, and some work for pizza. Mentoring is something I plan to do for the rest of my life, especially in my twilight years to keep my mind healthy and to give back.

I wish I would have had more mentoring earlier in my career. My bosses were gracious enough to introduce me to a few senior people. I met with them every few months and emailed more often. I should have taken more advantage though!

It was a simpler world back then. There were fewer frameworks and languages vying for attention. In today’s work the ‘stack’ or the ‘tree’ of technology is really getting out of hand with dozens of options in each category. Talk through this with your mentor.

Some encouragement:

Anybody can get into the field of software, not just white guys like me. In the 1970’s my grandmother took a programming course, perhaps similar to today’s boot camps. She started on punch cards and later wrote Cobol for the IBM mainframe. They tried to bring her back out of retirement in the late 90’s to help fix Y2k bugs but she wisely declined. I suppose I got the knack from her. As a female, she was a pioneer in the tech industry. I’m really proud of her. Her department had a few female coders. I’ve always noticed companies hold onto their female coders. There is a huge movement out there to get more women and minorities in tech. I fully support that kind of thinking. Yes, at bad companies there is a glass ceiling, harassment, and the old boys club to put up with. Screw those kinds of places. Be like Grandma and go for it.

Posted in Code, For New Developers | Tagged , , , | Comments Off on My Answer To: I want to learn programming, should I attend a code school?