DBAs are Out of Style and Now There’s a Hole In Your Database

The concept of a DBA – database administrator, has practically gone out of style as a full time job. DBA work, if it is being done, is handled by someone or something else, perhaps in a more vanilla way that works well enough for most systems. However, if the DBA work is being ignored then technical debt is silently piling up.

database administrator

There are still plenty of DBAs (about 120k) in the basements of large companies and government entities. It is just that growth in this area has really dropped off in spite of the fact it is a critical function of any data driven application. This is due to technological advancement (getting easier to run databases), and lack of standards (not anyone’s problem if the data is wrong from time to time). Still I think the Bureau of Labor Statistics is overly optimistic in its estimate that DBA positions will grow 11% between 2014-2024Compare that to the 1.1M software developer positions with estimated growth of 17% between 2014-2024.

So what exactly was a DBA and what did they do?

In the old days (80’s, 90’s, 00’s) the database administrators (DBAs) were in charge of all things database related.

  1. They made sure the database server was running correctly.
  2. They controlled how the data itself was structured so it could be stored efficiently.
  3. They made sure the maintenance scripts, upgrades and backups were running correctly.

Those three tasks are still pretty darned important to a successful system! Given that DBAs are not part of most software teams anymore, are we doing it right?

1) So who is now making sure the server is running correctly?

It used to be a lot harder to make a databases run smoothly. RAID arrays had to be custom configured. There were many arcane commands just to get the database to run on the network. Default block sizes had to align, custom configuration for memory, IO, etc.

Database setup has for the most part been handed to IT / DevOps / The Cloud. AWS RDS, for one example, makes databases an on demand commodity service (Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server). In this regard, life couldn’t be easier for application developers to spin up a production ready database.

The default configuration is still something that needs to be checked over and tuned based on the amount of memory on the box but that can be done pretty easily in an afternoon by an application developer or other IT staff member.

2) So who is now making sure the data is ‘structured efficiently’?

In today’s world the words ‘structured efficiently’ amount to a loaded term. It could mean anything to anyone. But if you look around the room and don’t see anybody doing this, then it’s probably a good idea to log a task under technical debt and check into it.

It used to be that data storage was fairly expensive. Application developers were aware of the issue but not necessarily responsible for optimizing the cost of the system. DBAs had control here, and they earned their keep on that front alone.

With cheap storage, lots of RAM, and virtually unlimited bandwidth, efficiency at the level of the 1s and 0s is not as critical as it used to be for a general run of the mill application. To many business level decision makers, what makes something efficient is how soon it can launch. In today’s world the cost of data storage is essentially a rounding error in a company’s overall budget. So again, no real pressing need for a DBA. However, I think that without someone who knows their stuff at the wheel to make sure the data is sane, problems can crop up especially with maintainability.

There are a number of best practices to follow when it comes to storing data. It really varies by platform, for example with a relational database, indexes, normalization, and eliminating stale data is super important. In the NoSQL world though duplication of data is expected and plays into performance goals.

One thing I see all the time in naive database designs is treating what is really historical data as a source of authority. It leads to huge problems. For example, let’s say you have an invoice from last week that links to a customer. What if the customer changes their address tomorrow. Should the invoice change? No! The invoice is a historical record and still needs to reference the customer, plus the address they had at the time they placed the order. Noticing slowly changing dimensions is another way DBAs earned their keep. Unfortunately slowly changing dimensions isn’t taught in school or isn’t appreciated from the application development perspective.

Another thing that sometimes worries me is handing over the responsibility for the database schema to a web framework. Without a DBA application developers are expected to handle database design and schema changes. Frameworks like Rails and Django come with built in tools to do it for you. That approach works up to a point. But in fact, many frameworks will do really stupid things when it comes to JOINs or the ability to build reports that a true DBA would laugh at. In my opinion it is best to fully understand what is going on under the hood when a framework takes over the responsibility of your database structure.

Sometimes the DBA is helpful, but sometimes they are just doing stuff to make their job seem important. I recall one DBA required that all tables be prefixed with ‘t_’. All fields had to be prefixed with ‘f_’ and end with an underscore and a character that denoted the field’s data type (i for int, c for character, d for date time, etc). The DBA also wanted views prefixed with ‘v_’ and stored procedures prefixed with ‘sp_’.

For example, let’s say there was a user table with columns user id and username, the DBA wanted to see:

[t_user].[f_user_id_i]
[t_user].[f_username_c]

Compared this to how I’d typically do it, no prefixes or suffixes:

[user].[user_id]
[user].[username]

The column [f_user_id_i] is the kind of fussy standards DBAs were getting paid to enforce corporate wide in the late 90’s early 2000’s. I can appreciate standards, but only when they add value. Adding readily available meta data to the name of a thing is redundant. It also makes application code and SQL painful to read. So in this respect, I’m really happy I now get to use the shorter, more concise form [user].[user_id] in my code without a DBA lording over it.

3) Who makes sure maintenance scripts, upgrades and backups are running correctly?

Sadly, this one is often neglected unless there is a dedicated sysadmin / IT / devops team watching over it. Again if you look around the room and nobody is doing it, then you probably need to get it on the schedule fast. When it comes to an out of date database engine – what you have is technical debt of the most easily combustible kind.

It is becoming more and more popular for application developers to be responsible for running the systems they write. This is the approach I advocate for with my customers because I want to be responsible for what I create. I want to fix bugs first. I want to keep the system running perfect. Tasking application developers with db maintenance also allows the business to eliminate the DBA position. Sad, but again systems are getting easier to maintain. AWS RDS does maintenance patches for you in your sleep (for the most part).

Unfortunately a lot of business people look at a software system like a fridge – something you plug in and leave for 20 years. The reality it software systems are more like custom race cars, one of a kind, built for a specific purpose, and high maintenance. The data in your application is sort of like the oil in the car, it flows through the system, needs to be cared for, and should not be ignored or left leaking!

The DBA may be gone, or at least morphed into a part time system admin part time DBA, but someone needs to tasked with treating the data like gold.

Posted in Application Development, Data, Sys Admin | Tagged , , , | Leave a comment

Tips for Getting into Software Development Heaven

In programming hell the keyboard layout changes every day. The Internet is laggy. The backlog is so long, it can’t even load. The control and shift keys only work half the time. You may only use pico as your editor. Tab completion never works. OS updates come out every day. OS updates occasionally don’t work. Sticky notes with highest priority cover every surface. There is a lack of natural light. Partly because of the line of people at your desk and the manager breathing down your neck. Long meetings full of powerpoint presentations occur twice daily. There is no way to mark an email as read! The servers are always down. Of course, since it is hell, the air conditioning was never installed… ahhhh!!!!

Programming Heaven or Hell

Given that, my goal is to end up in programmer heaven which I imagine to be:

An office with a comfortable sit stand desk and a big window that natural light pours into. My workstation comes with a giant set of screens and noise canceling headphones. The project has clear time lines and well written specs involving really cool technical puzzles. The code is solid and we are always ahead of schedule. The customers love us. The company is profitable too!

Unfortunately, there is hell on earth in the programming world.

Many of us have encountered a code base so screwed up that maintaining it can cause PTSD or other psychological disorders. I’m talking about the kind of system that makes programmers run and hide.

It stands to reason, whoever was responsible for writing that crap is probably going to programmer hell, right? Actually, I’m not sure if being associated with such a project is grounds for being eternally damned. I really hope it is impossible for one person to screw something up that bad.

It is more likely the blame goes to the overall systems put in place which allowed the bad code base to come into existence. Big code bases are not necessarily all bad. I’ve worked on huge systems with hidden gems buried in them. Maintenance can actually be really satisfying once you get your bearings.

There are a lot of companies that don’t value software, don’t understand it, and honestly don’t like being forced to invest in it. That is where the hellish environments originate from. They can be avoided, and in many cases improved.

How do we get to programmer heaven and avoid hell?

Aside from working at a good company, and not be a blackhat hacker, there is a lot the individual developer can do to control their destiny.

Code side of it:

  • Name things so it is obvious what the thing is and or does.
  • Comment code in the tricky / non-obvious sections and where the spec seems strange.
  • Code defensively – distrust incoming parameters (null, wrong type, incomplete, etc) and do not take external resources for granted (database, file system, APIs, etc).
  • Break up large methods into smaller chunks, and write unit tests if feasible.
  • When an error occurs make sure to log everything about the current state.
  • Use transactions whenever manipulating multiple records at once.
  • Read your own code changes before committing.
  • Read at least two books cover to cover on the main technologies you are working with so you don’t make the same noobie mistakes everyone else is making.
  • Launch code you can live with, don’t worry about gold plating it (appropriate to the context you work in).
  • Use a code linter and formatting rules common to the team.
  • When designing features and architecting for the ‘future’, consider one of my favorite acronyms – YAGNI (you ain’t gonna need it).
  • Make regular backups, especially before pushing changes live.
  • Always work to make the code base a little better with each commit.
  • Do not try and wall yourself off thinking it is ‘job security’ like a stupid barnacle.

Work side of it:

  • Be honest about estimates (maximum padding 2-3x). In other words, if you think it will take 8 ideal hours, tell them 16-24 at the most, but don’t tell them 36, then deliver it after 80 hours. Also don’t tell them 4 and deliver it after 7 with a bug included.
  • Try to beat your estimates, and pass the savings back to the organization.
  • Share your knowledge, especially when asked, and sometimes by your own initiative (eg tune up the Readme file when it is out of date).
  • Look for holes in the spec and ask questions. Ask as many questions as it takes until both sides completely agree on what it is that is being built.
  • When incurring technical debt, be open about it and get approval first. Don’t create technical debt on your own without your manager being aware. Keep a list of all the technical debt that everyone can refer to.
  • Understand the business first, then apply the technology to that. Being obsessed with technology for technology’s sake can lead to serious mistakes.

Another way to get to programmer heaven is to give back:

Steering people right is worth serious karma points. I’m working on that with my blog posts, have published over 100 now at http://www.laurencegellert.com.  In addition to that, Wiser Learning is a site I created to scratch my own learning ‘itch’ in the world of software. It keeps me motivated to learn with tools for planning and recording learning activities. It also helps me understand how my own knowledge fits together and where I need to improve (check out the Learning Graphs).

The good news is, it is free for learners! I’d love for you to try it out and hear your feedback.

Posted in Business, Code, Fun Nerdy | Tagged , , | Comments Off on Tips for Getting into Software Development Heaven

Some Tips on Improving your Social Skills (for Software Devs)

After 7 hours heads down working on code the last thing I want to do is talk to someone. In fact, I’m probably so in the zone by that point I’d likely go on to code for another 1-5 hours before calling it a day.

Writing code is what I’ve built my life around. The reality is most days I get a lot of emails, messages to reply to, and I have meetings to attend. So, even though I’m around code a lot and I love that, I’m also working with people constantly. The better you work with people the more value you add and the more indispensable you become.

One way to socialize with people is to get to know a little about everyone you work with. Memorize at least one thing each person is passionate about. All you have to do is bring it up and they will be happy to take over the conversation and tell you more about it.

Don’t waste time gossiping or bitching about work. Be the person who stays positive, or at least stays focused. It is okay to share about your interests outside of work. In fact that would be normal. The way you want to be abnormal is to talk about ideas regularly, events rarely, and people least of all or never.

Another good social skill is going out to lunch with people. At lunch, don’t bring up anything negative, don’t complain about work, don’t talk about office politics. Just talk about things that excite you. Mainly sit there and listen to what others have to say. Make eye contact while listening. In relation to what they are saying, it is okay to be interested or even offer support. Just make sure to they know your individual focus (writing great code, getting in the zone, getting the API launched, etc) so you are not seen as a threat.

Choose your mode of communication wisely. The modes are basically chat, email, phone, and in person.  Don’t write long emails, nobody reads those. Phone calls are great to cover complex issues that require a lot of back and forth.  The pen is mightier than the sword, so if you are the one who gets to summarize and send out notes from a meeting, that is a pretty good place to be. To rope in difficult people, one idea is to type up a summary email after a conversation and cc your boss.

You may argue, ‘screw everybody, I’d rather just code’.  If you isolate yourself at work, you better be extremely sharp technically because that is all you are bringing to the table. Maybe that will be enough for a career? Given how fast technology changes, you’ll need to be learning like crazy on your own time to stay ahead.

Posted in Business, Work | Tagged , | Comments Off on Some Tips on Improving your Social Skills (for Software Devs)

Add Facebook Open Graph (OG) meta tags to your WordPress site with one function

See below for a basic WordPress function that adds Facebook Open Graph OG meta tags to your WordPress site. Works with WordPress version 4.7.2 (at this site).

Facebook OG meta tags have become a standard way of making embedded links look good in Facebook, Twitter, LinkedIn and hundreds of other sites. The idea is to give the site embedding your link some clues about the title, description, and featured image. Documentation about the OG standard can be found here.

See how my twitter feed is pulling in nicely formatted links with the image and description:

Here is the code for generating the Facebook OG meta tags in WordPress:

The resulting meta tags for this page:

<meta property="og:title" content="Add Facebook Open Graph (OG) meta tags to your WordPress site with one function"/>
<meta property="og:description" content="See below for a basic WordPress function that adds Facebook Open Graph OG meta tags to your WordPress site. Works with WordPress version 4.7.2 (at this site). Facebook OG meta tags have become a standard way of making embedded links &hellip; Continue reading &rarr;"/>
<meta property="og:type" content="article"/>
<meta property="og:url" content="http://lgblog.dev/2017/02/add-facebook-open-graph-og-meta-tags-to-your-wordpress-site-with-one-function/"/>
<meta property="og:site_name" content="Laurence Gellert&#039;s Blog"/>
<meta property="og:image" content="http://lgblog.dev/content/uploads/2017/02/ogtags.png"/>

The software engineer in me shudders at the php global above, but that is how it is done in WordPress land! I don’t claim to be a WordPress developer and I don’t market myself as such. But my blog is hosted with WordPress (which I think does a great job). So from time to time I need to hack out a customization. I tried an existing plugin but it didn’t work (hadn’t been maintained in several months). That is a pretty common situation in the world of free plugins…

The above function should work for posts and pages. To make the image to come in make sure to actually set the featured image. If you don’t see that on the right hand menu on the edit post / page screen, you may need to add add_theme_support( ‘post-thumbnails’ ); to functions.php like I had to. Read more about that here.

Hope this helps!

Posted in Code | Tagged | Comments Off on Add Facebook Open Graph (OG) meta tags to your WordPress site with one function

How to Structure Ongoing Learning in the Software Field

Many software developers have a love/hate relationship with the amount of ongoing learning the profession requires. It can be really enjoyable to learn new things, especially languages and frameworks that are powerful and fun to use. That moment when a big idea crystalizes in the mind is priceless. At the same time it can be fatiguing to watch technologies change, especially the ones you’ve invested so much into.

Given the need to keep up, one thing I’ve concluded is, it is ultimately up to the individual.

software field always changing

Employers might pay for training. New projects may offer learning opportunities. Take advantage of those opportunities if you have them, but make sure to steer your own course. Be very aware if your job locks you into a legacy stack or you have settled into a coding rut. That leads to rusty skills and puts you at risk in the job market.

How I keep up in the software field:

1) Set broad goals that balance new interests, wild tangents, and core learning. For example:

A. This year dive into framework/language X. For me a few years ago that was getting back into Python and Django. Really enjoying it. Next on my list is TypeScript.

B. Try out technology Y in the next few months. In 2014 for me that was buying an Oculus Rift DK2. The goal was to build a virtual reality database explorer. It was a bust. Turns out VR technology makes me seasick. Hey, I just can’t code while nauseated! Recently my ‘new toy’ has been Angular 2, which seems pretty well designed and doesn’t make me gag.

C. Take a deeper dive into technologies you feel proficient in. Currently working my way through Fluent Python, which goes into the more obscure parts of the language, but has lots of great gems.

2) Whenever I encounter a syntax, acronym, or technology that I’m not familiar with, I look it up.

Earlier in my career I was having to look up things all the time! Today it is less frequent, but still common. This tactic has served me well.

3) Keep a journal of my learning and make it a habit.

See notes below on Wiser Learning.

4) Apply the knowledge some way – at work, in my blog, on twitter, etc.

The first three are the ‘input’ side of learning. Point #4 is the ‘output’ side, how you apply and make use of what you are doing, which gives you all important experience.

A tool to help with your learning:

I recently launched a site called Wiser Learning to solidify my own learning path and hopefully help others with their journey. Wiser Learning is in the early stages right now. It is completely free for learners. It is geared towards people trying to break into programming and for those who are already established but still learning (which is almost everyone in the field). I’m having a lot of fun doing it.

Wiser Learning

Right now Wiser Learning is a tool for individuals. The ultimate goal is to get enough developers using it that the best and most relevant learning content will surface on a daily basis. I plan to get a discussion forum going there soon. In any event, I will use Wiser Learning as a systematic way to connect with developers as a mentor.

Recording learning:

You probably don’t realize this, but even if you haven’t picked up a book or watched a video lately, you are constantly learning. Every time you read a blog post or google for how to implement a certain feature you are learning!

Many studies have pointed out the link between journaling and improved learning outcomes. Personally, I’ve found keeping a journal has boosted my motivation to learn. This in itself gives me confidence because I feel like I’m staying on track. Plus when I notice it has been too long since my last entry, I am extra motivated to crack open a book or watch a web video over my lunch break.  Click here to record your learning at Wiser Learning for the first time.

I’d love to hear your feedback about Wiser Learning. I’m also looking for help creating more Learning Graphs, more on those in my next post. Happy learning!

Posted in For New Developers, Work | Tagged , | 1 Comment

Ping your sitemap to Google and Bing with a bash script

If you are using a sitemap.xml file you know you need to submit it to the search engines on a regular basis (say nightly). This is done via a GET request to each search engine’s ‘ping’ URL. Many of the solutions out there for automatically pinging your sitemap.xml file to Google and Bing rely on PHP, Ruby, or other scripting language. On Linux servers, a simple bash script using the built in command wget is sufficient and avoids complexity. As of December 2016, looks like Google and Bing are the only two search engines that support this. Ask.com took their ping script offline, and Yahoo integrated with Bing.  I wound up writing this post because I was annoyed with what I found out there.  Django has a built in one but it only supports Google and never shows output even with verbosity turned up.

The first thing you need to do is URL encode the URL to your site map.

This can be done using an online URL encoding tool.

The URL to the site map for my blog is:
http://www.laurencegellert.com/sitemap.xml, but the search engine ping URL accepts it as a query string parameter, so it needs to be url encoded.

The URL encoded version of my sitemap url is http%3A%2F%2Fwww.laurencegellert.com%2Fsitemap.xml.

Use the URL encoded version of your sitemap url in the script below where indicated.

Place this script somewhere on your system, named ping_search_engines.sh.

#!/bin/bash
echo -------------  Pinging the search engines with the sitemap.xml url, starting at $(date): -------------------

echo Pinging Google
wget -O- http://www.google.com/webmasters/tools/ping?sitemap=YOUR_URL_ENCODED_SITEMAP_URL

echo Pinging Bing...
wget -O- http://www.bing.com/ping?siteMap=YOUR_URL_ENCODED_SITEMAP_URL

echo DONE!

The -O- part tells wget to pipe the output to standard out, instead of a file. That means when you run it manually it displays the output on screen. Wget’s default behavior is to save the returned data to a file. A less verbose mode is -qO-, which hides some of wget’s output, but I prefer to have all that information in the log.

Run chmod +x ping_search_engines.sh so the file is executable.

Add the following cron entry, which will trigger the script every night at 1am:

0 1 * * * /path/to/ping_search_engines.sh >> ~/cron_ping_search_engines.log 2>&1

This script is a good initial way to get going for a simple website. For heavy duty websites that are mission critical, or that your job relies on I’d take it a few steps further:

  • Locate the cron log output in a directory that gets logrotated (so the log file doesn’t get too big). The output is small so even after running it for a year or more the file won’t be that large, but like all log files, it should be setup to auto rotate.
  • Scan for absence of the 200 response code and alert on failure.
Posted in Sys Admin | Tagged , , | Comments Off on Ping your sitemap to Google and Bing with a bash script

Backup to AWS S3 with multi-factor delete protection and encryption

Having secure automated backups means a lot to me. This blog post outlines a way to create encrypted backups and push them into an AWS S3 bucket protected by MFA and versioning, all with one command.

There are four parts to getting the backup big picture right:

Step 1 – secure your data at rest

If you don’t secure your data at rest, all it takes is physical access to get into everything. The first thing I do with a new machine is turn on whatever built in encryption is available. MacOS has FileVault. Ubuntu offers disk encryption plus home folder encryption. With the speed advantages of SSDs I don’t notice a performance penalty with encryption turned on.

Step 2 – backup to an external hard drive

On Mac I use Time Machine with an encrypted external HDD as a local physical backup.

Step 3 – push to the cloud

In case of theft, my backups are encrypted and then pushed to the cloud. There are lots of cloud backup providers out there, but AWS S3 is an economical option if you want to do it yourself. The setup is much easier than it used to be!

As of November 2016, S3 storage costs about $0.03 / GB for standard storage, but you can get it down even more by using Infrequence Access Storage or Glacier storage. Since I’m not backing up that much, maybe 20GB, it is a significant savings over services that charge $10/month. Most importantly I have more control over my data, which is priceless.

Step 4 – verify your backups

Periodically, restore your backups from Step 2 and 3. Without verifying your backups, you have no backups you just think you do.

Overview for getting AWS S3 going:

  1. Enable AWS MFA (multi-factor authentication) on the AWS root account.
  2. Create a bucket that is versioned, private, encrypted, and has a 1 year retention policy.
  3. Setup a sync job user in AWS IAM (identity and access management).
  4. Install AWS CLI tools, authenticate as the user created in step 3, and start syncing data!
  5. At this point you can sync your files unencrypted and rely on S3’s encryption (which requires some extra configuration of the IAM user and s3 sync command), or you can make tar files and encrypt those with something like gpg (linux, Mac) and then push those to the cloud. This article explains the latter.

With this setup, if files were deleted maliciously from S3, the root account can go into the bucket’s version history and restore them. That in turn requires the password PLUS the authenticator device, making it that much more secure.

ALERT / DANGER / DISCLAIMER: I’m not a security expert, I’m an applications developer. If you follow this guide, you are doing so at your own risk! I take zero responsibility for damages, lost data, security breaches, acts of god, and whatever else goes wrong on your end. Furthermore, I disclaim all types of warranties from the information provided in this blog post!

Detailed AWS Bucket Setup with Sample Commands:

1) Enable AWS MFA

AWS root logins can be setup with multi-factor authentication. When enabled, you login with your password plus an authenticator code from a physical device (like your phone). The app is free. In theory this protects from malicious destruction of your data in case your root password is compromised. The downside is, if you loose the authenticator device, you have to email AWS to prove who you are to get it re-issued. ENABLE MFA AT YOUR OWN DISCRETION. Make sure to read the docs and understand how MFA works.

To enable MFA, login to your root AWS account, click on the account menu in the upper right and go to ‘Security Credentials’. You’ll need to install an authenticator app on your smart phone and scan a QR code. The instructions will walk you through it.

AWS MFA

 

2) Create a new S3 bucket

Navigate to the ‘S3’ module, and create a new bucket.

  1. Pick a region that is on a different continent for maximum disaster recovery potential. I went with Ireland for price and geographic reasons, there are many options to choose from.
  2. Under Properties, Versioning, enable versioning on the bucket.AWS S3 Bucket Enable Versioning
  3. If you want to automatically clean up old deleted files, you might setup a rule under Lifecycle, Add Rule, Whole Bucket, Action on previous version:
    1. Check Permanently Delete – 365 days after becoming a previous version
    2. Check “remove expired object delete marker”

    AWS S3 Bucket Lifecycle

3) Create a sync job user.

A new user with a very narrow permission set will be used to backup your data into the bucket you just created. The sync user is only able to read/write to the S3 bucket and nothing else. Importantly, the sync user is not allowed to delete buckets!

Under AWS’s Identity and Access Management (IAM), add a new user ‘sync-user’. This user does have delete access, but the bucket is versioned so the data is still there just flagged as deleted.  Make sure to save the access key and secret key it generates somewhere safe like your KeePass file.

Give the new user the following custom inline policy. Click on the newly created user, go to the Permissions tab, expand Inline Policies, click Create User Policy, select Custom Policy. Name it something like ‘backup-policy’.

aws iam user

AWS IAM add inline policy

AWS IAM custom policy

AWS IAM inline policy document

For the Policy Document, copy the following verbatim, except the bucket name. Replace BUCKET_NAME_HERE, which appears in two places, with the name of your bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET_NAME_HERE"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET_NAME_HERE/*"
            ]
        }
    ]
}

4) Get AWS CLI setup and start syncing!

Amazon provides a set of command line tools called AWS CLI. One of the commands is s3 sync which syncs files to an S3 bucket, including sub directories. No more writing your own script or using a 3rd party library (finally).

  1. To install AWS CLI on Mac:
    sudo pip install awscli --ignore-installed six

    See the official AWS CLI install instructions page for more details.

  2. Configure AWS CLI:In a terminal run the following:
    $ aws configure

    Paste in the keys from the user created in step 3 above.

To sync a folder to the bucket, run the following command in the terminal. Replace BUCKET_NAME_HERE with your bucket’s actual name.

$ aws s3 sync ~/folder s3://BUCKET_NAME_HERE/folder --delete

Read more on the aws s3 sync command here.

5) A sample script that tars everything, encrypts, then copies to S3:

To get this working, you’ll need to install gpg and setup a public/private key with a very strong password. Make sure to backup that key. I have an article on GPG here.

#!/bin/bash
date=`date +%Y-%m-%d`
echo -------------  Backup starting at $(date): -------------------


echo "Copying files from my home folder, Desktop, etc into ~/Documents/ so they are part of the backup "
cp -r ~/.ssh ~/Documents/zzz_backup/home/ssh
cp -r ~/Desktop/ ~/Documents/zzz_backup/Desktop
cp ~/.bash_profile ~/Documents/zzz_backup/home/bash_profile
cp ~/.bash_profile ~/Documents/zzz_backup/home/bash_profile
cp ~/.bash_prompt ~/Documents/zzz_backup/home/bash_prompt
cp ~/.path ~/Documents/zzz_backup/home/path
cp ~/.gitconfig ~/Documents/zzz_backup/home/gitconfig
cp ~/.my.cnf ~/Documents/zzz_backup/home/my.cnf
cp ~/backups/make_backup.sh ~/Documents/zzz_backup/home/

echo "Clearing the way in case the backup already ran today"
rm ~/backups/*$date*


echo "Making archives by folder"
cd ~/

echo "    Documents..."
tar -zcvf ~/backups/$date-documents.tar.gz ~/Documents
# .. you may want to backup other files here, like your email, project files, etc


echo "GPGing the tar.gz files"
cd ~/backups
gpg --recipient {put your key name here} --encrypt $date-documents.tar.gz
# ... again, add more files here as needed

# NOTE: to decrypt run gpg --output filename.tar.gz --decrypt encryptedfile.tar.gz.gpg

echo "Removing the tar.gz files"
rm $date*.tar.gz

echo "Syncing to S3"
/usr/local/bin/aws s3 sync ~/backups s3://my-backups/backups --delete

echo Done!
Posted in Sys Admin | Tagged , , , | Comments Off on Backup to AWS S3 with multi-factor delete protection and encryption

Django Group By Having Query Example

Example Group By Having query in Django 1.10.

Let’s say you have SQL such as:

SELECT DISTINCT user_id, COUNT(*)
FROM my_model 
WHERE tag_id = 15 OR tag_id = 17
GROUP BY user_id 
HAVING COUNT(*) > 1

The above query finds all the user_ids in the my_model table with both tag id 15 and 17.

In Django a Group By Having query is doable, but you have to follow the Django way:

MyModel.objects.filter(row_id__in=[15,17])\
.distinct()\
.values('user_id')\
.annotate(user_count=Count('user_id'))\
.filter(user_count__gt=1)\
.order_by('user_id')

Breaking it down:

MyModel.objects.filter(row_id__in=[15,17])

This should look familiar, __in is a nice shortcut for an OR clause, but that isn’t material to the GROUP BY / HAVING part at all.

.distinct()

Adds the DISTINCT clause. It may not be relevant in your case. I’m doing it to be clear that I don’t want to see the user_id more than once.

.values('user_id')

This is where you specify the column of interest.

.annotate(user_count=Count('user_id'))

The annotate call sets up a named key and specifies the aggregate SQL function (COUNT, SUM, AVERAGE, etc). The key user_count is arbitrary, you can name it whatever you want. It gets used in the next line and is the row dictionary key for the count value (think SELECT COUNT(*) as user_count).

.filter(user_count__gt=1)

This is what makes the HAVING clause appear in the SQL that gets executed. Note that user_count__gt is matching the named key created on the previous line and filtering it for values greater than 1.

.order_by('user_id')

The example SQL above doesn’t have an ORDER BY, but I’ve added it here to point out a quirk in Django. If order_by is left off Django will add the model’s default sort column to the SELECT and GROUP BY sections of the query. This will screw up your results. So, when doing a Group By / Having query in Django, it is a good idea to explicitly specify a sort order even if it doesn’t matter in a logical sense.

It will return something like:

<QuerySet [{'user_count': 2, 'user_id': 1L}, {'user_count': 2, 'user_id': 5L}]>

Trying to get your query working in the shell? This might help: how to turn on SQL logging in the Django shell.

Posted in Code | Tagged , | Comments Off on Django Group By Having Query Example

Django enable SQL debug logging in shell how to

How to get Django to show debug SQL logging output in the shell. Works in Django 1.10!

Start the Django shell:

python manage.py shell

Paste this into your shell:

import logging
log = logging.getLogger('django.db.backends')
log.setLevel(logging.DEBUG)
log.addHandler(logging.StreamHandler())

The last line log.addHandler(logging.StreamHandler()) may not be needed if you already have a StreamHandler in your logging config. Worst case, you’ll see the SQL twice.

Next time you run a QuerySet it will show the SQL and the regular result:

>>> MyModel.objects.filter(row_id__in=[15,17])\
.distinct().values('user_id').annotate(user_count=Count('user_id'))\
.filter(user_count__gt=1).order_by('user_id')

DEBUG (0.001) SELECT DISTINCT `my_model`.`user_id`, COUNT(`my_model`.`user_id`) AS `user_count` FROM `my_model` WHERE `my_model`.`tag_id` IN (17, 15) GROUP BY `my_model`.`user_id` HAVING COUNT(`my_model`.`user_id`) > 1 ORDER BY `my_model`.`user_id` ASC LIMIT 21; args=(17, 15, 1)

<QuerySet [{'user_count': 2, 'user_id': 1L}, {'user_count': 2, 'user_id': 5L}]>
Posted in Code | Tagged | Comments Off on Django enable SQL debug logging in shell how to

Quick guide to freeing up disk space in Ubuntu 16.04

Ubuntu retains apt package downloads. It also default installs all locales (language packs). These can be freed up to reclaim as much disk space as possible. This should come in handy on VPS servers, especially the smaller ones backed by SSDs. For example the smallest Linode server currently available is a 24GB SSD so every free MB counts. On larger severs, it probably isn’t worth it though. Was able to reclaim about 200MB in my case.

In summary I ran the following:

sudo apt-get install localepurge
sudo apt-get clean
sudo apt-get autoremove
sudo reboot

After initial setup of a new Ubuntu app server it was using 1.86 GB. Then after running the above commands it was using 1.66 GB, so about 200MB was freed up. May not be worth your while and your mileage may vary (YMMV)!

Original Disk Usage:

laurence@appserver:~$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root       24505644 1960172  21283904   9% /
...

Resulting Disk Usage:

laurence@appserver:~$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root       24505644 1743556  21500520   8% /
...

About localepurge:
Localepruge is a tool that gets rid of all the language packs you don’t need (because you can’t read 400+ languages).  Run it at your own risk!

It defaults to keeping en-US packages, but loads a dump terminal style menu that lets you pick the languages / locales you want to keep. Have fun wading through the hundreds of options in the menu if you want to change the settings. Use the tab key to get to the <ok> button.

localepurge

It requires a reboot after being installed to finish the cleanup process.

Before using it, please read more about localepurge here: https://help.ubuntu.com/community/LocaleConf

About apt-get clean:

Ubuntu saves packages downloaded through apt before they are installed. This can start to add up.

To see how much space the cached packages are using run:

sudo du -sh /var/cache/apt/archives

Mine was only using 32MB at the time, but the command freed it up completely.

To delete the cached packages:

sudo apt-get clean

You’ll probably want to run that command periodically, or each time after running `apt-get upgrade`.

About apt-get autoremove:

This command cleans up packages that were installed as dependencies to packages that are no longer installed.

Eg, if package A required package B and C, but you uninstalled package A, package B and C would still be around. Calling autoremove will ‘take care’ of package B and C.

On a freshly installed server, you probably won’t free up much space with this command but if your server has whiskers it may help.

Posted in Sys Admin | Tagged , | Comments Off on Quick guide to freeing up disk space in Ubuntu 16.04