Software Development Heaven – sit stand desk and Herman Miller chair

This year I joined the sit-stand work environment revolution and I love it! When it comes to being a successful software professional, investing in the right tools is important. Many of us overlook ergonomics, especially our desk and chair. In terms of productivity, I would argue our desk and chair are just as important as our workstation/laptop, keyboard, IDE, and even wifi connection.  An internet connection is now suddenly much less useful to me when it doesn’t come with an ergonomically designed sit-stand workstation.

Unfortunately most employers are cheap when it comes to providing good desks and chairs for their people. Thankfully this is changing. In 2006 Joel Spolsky pointed out a top of the line chair helps with staff retention and productivity. When averaged out over the life of the furniture, per day it costs less than toilet paper! Think about that the next time you use the bathroom.

I have gone through 4 chairs from big box stores in the past 6 years. Some for $75, one $200! This time around I did some research and have hopefully fixed the issue once and for all. This time I went all out and invested in the best chair I could find. That means skipping the big box stores completely. Only boutique furniture stores that specialize in ergonomics have the right stuff. I also opted for a motorized sit-stand desk, allowing me to stand for part my work day further improving my posture.

BEFORE – sit down only, chair from big box store:

my desk before converting to sit stand

AFTER – sit stand motorized desk with Herman Miller Embody chair:

sit stand desk software
Sitting configuration.

sit stand desk, standing mode

Standing configuration with anti-fatigue mat.

I use an anti-fatigue mat when in standing mode. I spent about $40 on a good one from Amazon. My rug is pretty thin on top of hard wood floors. After an hour I do notice my feet start to ache without the mat. When standing I make sure not to lock my knees. I shift my weight between feet or adjust my posture every few minutes. At first it is a little strange standing up and programming. It puts me in a different kind of mental zone where I feel an urgency to get things knocked out.  Feels like ‘the meter is running’, so I need to get more done when I’m standing. I stand 2-3 hours at most in a day. The difference at the end of the day is noticeable!

I am overjoyed with this setup. My productivity is way up and I feel great!

This entire setup was $1,850 including freight shipping. That is less than a new 15” MacBook Pro! It should last 10-12+ years, much longer than your average development laptop. Doing the math, if this setup lasts just 10 years the cost is ~$0.50 per day.  Seems like a no brainer. Plus it makes a nice tax deduction, which are few and far between in the software business.

How to buy an ergonomic desk and chair – shop in person if possible:

When shopping for a desk and chair, I didn’t want to order it blindly off the internet, especially the chair. I recommend going to a showroom that carries Herman Miller chairs. Pacific Furnishings in Portland, Oregon carries the entire Herman Miller line. They have a huge selection of high end office furniture. It is a fun place to visit. I sat in the Embody model for about 10 minutes and noticed it improved my posture right away. The Embody makes it hard to slouch, but also has a nice rocking mode when in a conference call or watching a presentation. I also tried the Aeron and Mirra models, but I didn’t like them. The Embody was the chair for me. All Herman Miller chairs get great reviews but each model is unique and suits a particular body style.

herman miller embody for software professional

I just love the exoskeleton design of the Embody:

herman miller embody for software professional

Finding a good sit stand desk:

Sit stand desks may be harder to test out locally. Thankfully ErgoDepot has an office in Portland. The cheaper sit stand desks are way more wobbly when in standing mode. I considered the crank style desks because they are cheaper, but the crank takes FOREVER. So I went for the motorized version.

Motorized sit stand desk switch

The switch on the left is to power the motor, the switch on the right raises and lowers the desk.

If you are on a budget, a do it yourself (DIY) standing desk is not that hard to make. Here is the ghetto one I built temporarily. I considered building a second standing desk just for my laptop. That takes up extra floor space and switching machines would break my train of thought.

DIY standing desk

I recommend rigging something temporary just to see if you like standing while working.

Taking it to extremes:

If you really want to take ergonomics in the workplace to an extreme, ErgoDepot has desks hooked up to treadmills! I’m pretty sure coding while walking would be an interesting skill to master, perhaps someday an Olympic sport. Interviewer: For this next interview question we’d like you to implement the bubble sort routine in C while jogging on a treadmill. Ready, set, go!

They say spend good money on anything that separates you from the ground – your bed, your tires, your shoes. Now I include my computer chair and desk in that list and I hope you do too!

Posted in Business, For New Developers, Work | Tagged , , , | Comments Off on Software Development Heaven – sit stand desk and Herman Miller chair

AngularJS Review – A Sweet Client Side JavaScript Framework

AngularJS makes everything else look obsolete. I’m looking at you Backbone and Ember… It is one of those things where in hindsight the approach appears obvious because it is so elegant. However it took the web a good 15 years to arrive at this point. It makes jQuery look like VHS 😉

AngularJS large

AngularJS’s biggest strength is how it automatically binds form data and DOM state in an intuitive spot called $scope. It keeps track of populating everything, firing events, showing/hiding blocks, even looping, sorting and filtering of arrays. It just works.

A short LIVE demo with code samples:

AngularJS Demo - Client Side CRUD Prototype

An example of backing a page with JSON, allowing local additions, local sorting, local filtering by name, and AngularjS form validation. Note - this demo works with AngularJS 1.1.5.

Name Filter:
Name Category
{{tree.name}} {{tree.category}}
Name required. Category required.
The Script:

The HTML5 - note the "ng-*" directives, those make the AngularJS wiring happen:

 

Other things I like about AngularJS:

  • Change a variable in the $scope in one spot and it is reflected in the DOM and everywhere else that variable is referenced. This works for variables used programmatically, including pagination, search filters, etc. Very slick.
  • Works great with JSON.
  • Supports dependency injection.
  • Unit test friendly.
  • Has its own rendering syntax, which is denoted by {{ some.data }}, which starts with just enough power but allows you to extend. That aspect of it reminded me quite a bit of Django.
  • Nothing is stopping you from using as many frameworks as you like along side it. There’s often no point in doing much with jQuery, but you can if you like.
  • It is flexible in terms of which modules you choose to utilize. AngularJS may be used as a single page app, or stand alone in a more traditional full-page-load-per-request style app.
  • It is easy to get started. The first few minutes of looking at it is a mind bender. However, after watching a 1 hour intro video I was pretty well oriented. You need to understand $scope, $routes, and how the app and its controllers fit together. After a day with it, and some pointers from a colleague who had done a couple apps in it, I was knocking out features at a good clip, perhaps even faster than I could with jQuery.

Like any software it has its weaknesses. What I don’t like about AngularJS:

  • Still a little quirky. The docs don’t always line up with the version you may be using. I started with 1.0.8 then switched up to 1.1.5, which helped a lot.
  • In terms of keeping legacy AngularJS apps updated, it would be painful to take a fully debugged 1.0.3 app that works in production and upgrade it to 1.2.x. There is just too much is going on under the hood. Note that I have not done that, but it is just the sense I get from working with it and seeing what kinds of stuff people are running into on StackOverflow.
  • AngularJS is asynchronous in nature which can be tricky to program against, especially when dealing with security in a single page app.
  • If you want to use an outside plugin, like a UI widget, be prepared for issues. It is up to you to make sure the widget events get applied to the scope, since it won’t know about them until you manually wire them in.
Posted in Application Development | Tagged , , , , | Comments Off on AngularJS Review – A Sweet Client Side JavaScript Framework

Using MySQL with Encrypted SSL Connections

MySQL offers native support for connecting via SSL. By default this is available in AWS RDS MySQL instances. Using this connection method effectively encrypts all data going back and forth between the client and the server. This prevents eavesdropping (aka packet sniffing). This is especially important in relation to cloud hosting, where traffic sniffing may be possible by other customers. There are other ways to protect the traffic (ssh tunnels, VPN), and I discuss the pro’s and con’s of these below.

I wanted to find out how much of a performance hit MySQL’s SSL mode caused so I did a benchmark which you can read about here.  The performance penalty is pretty high – 20% and up, OUCH!

Documentation for using SSL natively with MySQL:

Application Layer Changes:

Connecting to MySQL in SSL mode requires extra connection options. On the command line this is as simple as adding the –ssl_ca option which points to the *.pem file. In the case of X.509 certificates, –ssl_cert and –ssl_key are also required. Note that RDS does not currently support X590 client certs for connecting.

This translates into minor application level code changes. In addition the SSL cert files will need to be stored on the application server.

Some documentation links:

For more information: Using SSL with MySQL

Changes to GRANT statements:

MySQL supports a GRANT statement modifier ‘REQUIRE SSL‘ which will need to be applied to the application layer database accounts. This requires the appuser account to connect with SSL.

GRANT SELECT, INSERT, UPDATE, DELETE
ON database.* TO 'appuser'@'appserver'IDENTIFIED BY '****'
REQUIRE SSL;

Similarly to require the client to have a valid certificate, the ‘REQUIRE X509’ statement can be used:

GRANT SELECT, INSERT, UPDATE, DELETE
ON database.* TO 'appuser'@'appserver'IDENTIFIED BY '****'
REQUIRE X509;

Alternate Methods of Securing Data Transport:

Protecting data transport between the db server and the app server can also be done using ssh tunneling with something like autossh or a VPN. While ssh tunnels are a little ghetto, a VPN is really the best option. Both these approaches delegate the encryption to the network layer making it transparent to the application layer. This sort of work is handled by the dev ops / networking / sys admin team.  Setting up a secured connection correctly so it is highly available takes skill and is not cheap. This is data security we are talking about, something to take very seriously!

With AWS RDS, ssh tunnels and a VPN are not feasible since MySQL is provided as a service. With RDS the underlying network and platform details are not accessible. It is not clear if the AWS Virtual Private Cloud (VPC) solution offers protection against traffic sniffing in relation to an EC2 app server connecting to an RDS database. With the 20% minimum performance hit from enabling SSL, that gives your team a lot to consider.

Why care about encrypting traffic between the app server the db server?

In many cases, the connection between the application server and the database server can be unencyrpted.

The most common starter case is an application connecting to localhost for its database. No need to worry about encryption there since everything is on the same box.

Going to a two tier or n-tier model where the application servers and the database servers reside on different hosts, the traffic may or may not need to be encrypted between them. If the hosts are all in the same rack sharing the same secured switch, or the traffic is on a trusted network, then there is no threat of packet sniffing.

This all changes the second you deploy to AWS or other cloud provider. Traffic between hosts goes across the cloud provider’s internal network. A cloud provider’s network is something you as a customer do not control, and in fact share with every other customer. When the underlying network is a shared resource, traffic sent between your servers should always be encrypted since you don’t know who might be listening.

It could be argued that unimportant data like system logs or metrics can be sent unencrypted. I agree, but it should be evaluated on a case by case basis.  Customer names, email addresses, account numbers, or other personally identifying information (passwords?!) do end up in log messages from time to time.

 

 

Posted in Sys Admin | Tagged , , , , | Comments Off on Using MySQL with Encrypted SSL Connections

Bechmark of MySQL with SSL on AWS RDS

Ran a very specific benchmark of MySQL – the native SSL connection performance penalty on AWS RDS.

When establishing a MySQL connection there is a way to tell the server to use SSL to encrypt the communication between the client and the database. When activated the database is doing all of its normal work plus the extra work of encrypting the data. Some overhead is expected. Advances in encryption technology and CPU power have come a long way since their original introduction so the actual impact might not be much.

The procedure was to setup an AWS RDS MySQL 5.6.13 m1.medium instance. A test database was filled with 10M rows to make it ~2.2GB in size. The load tests were performed using SysBench from an EC2 host in the same availability zone with and without the SSL option enabled.

Ran into some problems with SysBench and SSL options on RDS, which I will explain below, but first to the RESULTS!

Results of SSL vs non-SSL:

MySQL with and without SSL, 25 threads
With SSL Without SSL % Penalty (SSL)
Transactions/sec 36,535 43,561 -16.1% fewer transactions
Read/write requests/sec 3,885 4,597 -15.5% fewer requests
Avg (ms) 123 103 +19.4% (slower)
95th Percentile (ms) 172 126 +36.5% (slower)

 

rds with ssl 1

rds with ssl 2

Yeesh, MySQL’s native SSL hurts!

  • Add 20% to the response time, sometimes way more.
  • Cut throughput by ~16%. 

Personally – I was hoping for something like 3%…  I’m surprised at how high the performance penalty is and would look at other options first like a VPN. Response time and throughput may or may not be as critical as security (which can’t be compromised), but this is not an easy tradeoff.

Other sources of information on the topic of MySQL SSL performance:

In 2011 yaSSL did a similar benchmark test and noted a 15 -40% performance penalty, with an average penalty of ~17%. Their study was not specifically targeted at RDS and was ran locally on a Macbook Pro.

The MySQL Connector/J 5.0 driver had the following stat“The performance penalty for enabling SSL is an increase in query processing time between 35% and 50%, depending on the size of the query, and the amount of data it returns.”

These results match the yaSSL results pretty closely, but it did not come close to penalties like the Java JDBC driver saw (though we didn’t test with Java in this case).

Limitations and Additional Testing Called For:

MySQL RDS instances always have SSL compiled in and enabled. This test compares the performance penalty of connecting with SSL vs a standard unencrypted connection. This test does not look at how the performance would change if SSL was disabled or excluded from the MySQL binary.

SysBench was not cooperating with MySQL SSL on RDS!

First of all, SysBench provides no documentation for the –mysql-ssl=on option, leaving you to rely on a series of error messages as your only clue. This SO answer was helpful. Ended up studying the source code and the mysql_ssl_set() function documentation. Not quite RTFM, but in this case RTFC – RTF Code!

As of Sysbench 0.4.12, when using the –mysql-ssl=on option, it requires the server’s CA certificate, the client key and the client cert.  That effectively forces you into X509 mode. This is hard coded into /sysbench/drivers/mysql/drv_mysql.c.

  if (args.use_ssl)
  {
    ssl_key= "client-key.pem";
    ssl_cert= "client-cert.pem";
    ssl_ca= "cacert.pem";

    DEBUG("mysql_ssl_set(%p,\"%s\", \"%s\", \"%s\", NULL, NULL)", con, ssl_key,
          ssl_cert, ssl_ca);
    mysql_ssl_set(con, ssl_key, ssl_cert, ssl_ca, NULL, NULL);
  }

As it turns out SSL options –mysql-key and –-mysql-cert are optional for SSL mode and only need to be used if you want to connect with client X509 cert.

The problem with this is AWS RDS provides a cert file only but not the private key needed to generate X509 client certs! This was VERY annoying but I was able to roll up my sleeves, recompile SysBench (an adventure in itself), and got it to work with just the ssl_ca option.

# change sysbench/drivers/mysql/drv_mysql.c
# line 398 to
mysql_ssl_set(con, NULL, NULL, ssl_ca, NULL, NULL);

#then recompile sysbench

At first I tried to create my own client cert by skipping the private key option but that did not work. MySQL is smart about enforcing certificate authenticity.  It also makes sense that AWS keeps their RDS private key private. The side effect of that is MySQL ‘GRANT…. …REQUIRE X509’ does not work with AWS RDS. Amazon needs to allow users to install their own CA certs into their RDS instances. It is a little disconcerting to me that every single RDS MySQL instance is sharing the exact same CA cert!

How I ran the tests:

Prepare a 2.2GB database with ~10M rows.

sysbench --test=oltp --mysql-host=host --mysql-user=user --mysql-password=*** --mysql-table-engine=innodb --oltp-table-size=10000000 --max-time=180 --max-requests=0 prepare

Run the test with SSL:

sysbench --num-threads=25 --max-requests=100000 --test=oltp --mysql-host=host --mysql-user=user--mysql-password=**** --mysql-table-engine=innodb --oltp-table-size=1000000 --max-time=180 --max-requests=0 run --mysql-ssl=on

Run the test without SSL:

sysbench --num-threads=25 --max-requests=100000 --test=oltp --mysql-host=host --mysql-user=user --mysql-password=**** --mysql-table-engine=innodb --oltp-table-size=1000000 --max-time=180 --max-requests=0 run

 

Other MySQL Benchmarks:

I have benchmarked MySQL on RDS in a previous post, and discussed the pros and cons of running RDS vs an EC2 instance locally. This test cost under $0.50 on AWS. I love it! It was more expensive this time around because of the troubleshooting with SysBench.

I encourage readers to play around with load test experiments. Getting past assumptions by conducting experiments and analyzing the data is a lot of fun!

Posted in Data, Sys Admin | Tagged , , , , | 1 Comment

Building SysBench in Ubuntu 13.04

When trying to build SysBench 0.4.12 you may be getting an error like:

/bin/sh ../libtool --tag=CC   --mode=link gcc -pthread -g -O2      -o sysbench sysbench.o sb_timer.o sb_options.o sb_logger.o db_driver.o tests/fileio/libsbfileio.a tests/threads/libsbthreads.a tests/memory/libsbmemory.a tests/cpu/libsbcpu.a tests/oltp/libsboltp.a tests/mutex/libsbmutex.a drivers/mysql/libsbmysql.a -L/usr/local/mysql/lib/ -lmysqlclient_r   -lrt -lm
../libtool: line 838: X--tag=CC: command not found
../libtool: line 871: libtool: ignoring unknown tag : command not found
../libtool: line 838: X--mode=link: command not found
../libtool: line 1004: *** Warning: inferring the mode of operation is deprecated.: command not found
../libtool: line 1005: *** Future versions of Libtool will require --mode=MODE be specified.: command not found
../libtool: line 2231: X-g: command not found
../libtool: line 2231: X-O2: command not found
../libtool: line 1951: X-L/usr/local/mysql/lib/: No such file or directory
../libtool: line 2400: Xsysbench: command not found
../libtool: line 2405: X: command not found
../libtool: line 2412: Xsysbench: command not found
../libtool: line 2420: mkdir /.libs: No such file or directory
../libtool: line 2547: X-lmysqlclient_r: command not found
../libtool: line 2547: X-lrt: command not found
../libtool: line 2547: X-lm: command not found
../libtool: line 2629: X-L/root/sysbench-0.4.12/sysbench: No such file or directory
../libtool: line 2547: X-lmysqlclient_r: command not found
../libtool: line 2547: X-lrt: command not found

First, install the mysql libs and necessary build tools if you have not already:

sudo apt-get install libmysqlclient-dev
sudo apt-get install gcc make build-essentials libtool automake

Then, run the extra libtoolize and autogen commands which correct the issue:

./configure
make
#... problem happens
# keep going to fix it...
libtoolize --force --copy
./autogen.sh
./configure
make
sudo make install

This was not my problem, related to RANLIB, but may help you:
http://adminlogs.info/2012/11/19/libtool-error-with-sysbench-0-4-12/

Posted in Sys Admin | Tagged , , | 1 Comment

Are you Smart yet? Will everything become Smart someday?

Everything is getting capital ‘S’ Smart these days. Smart phones, Smart homes, Smart cars, in EPIC there are ‘Smart phrases’, and even a local cafe chain has something called Smart beans….

The Smart trend lumps together intelligent networks, big data, bio-metrics, domains that end in .io, and apparently organic farming practices.  Makes Web 2.0 and the ‘e’ and ‘i’ nomenclature seem so yesterday.

The Smart trend actually takes a lot of smarts to solve and will no doubt take many iterations to get right.  Sounds like a lot of fun stuff to work on!

What is driving this change:

The cost of smartness is dropping based on two principles:

  • Sensing:
    Physical instruments that collect data are becoming more sophisticated, widely available, and easily networked. I personally love the idea of easily and cheaply connecting the world of software to the real world of atoms, photons, temperatures, pressures, ppms, and anything else that can be measured.OLYMPUS DIGITAL CAMERA
  • Storing:
    The cost of storing most ‘facts’ is already a rounding error. So, why not just store everything? If you doubt this, see my review of the book “Free”.

The ‘Smart Milk Carton’ concept:

There is a romantic idea of the future where the milk carton tells the fridge it is running low or expired. The fridge then adds milk to the household shopping list. The grocery store sees this and gives you a coupon for milk as you pull up to the store. Or, the delivery service drops off more milk automatically.

cat

That is a novel application of technology, but I’m not sure it solves a need that bothers people enough. Maybe for a restaurant or cafeteria this makes sense, but it would be a luxury for the average household.

Smart becomes practical when the information flow has economic significance:

As a thought experiment, apply the ‘smart milk carton’ technology to a hospital’s store of medications. The hospital’s inventory of drugs could become Smart. When a medication runs low or expires more is automatically ordered. Orders to suppliers happen automatically reducing human error and staff overhead. Analyzing the flow of medications can help reduce costs, ensure vital drugs are fully stocked, and forecast which drugs are needed when.

The information flow surrounding a $4 gallon of milk once or twice a month is not that interesting or economically viable. The information flow surrounding the millions of dollars worth of medications used everyday is just one example of how hospitals will get ‘Smart’.

Smart devices measure you:

You know how if you don’t pay for something (like Facebook or Google) you are the product?

Well our DNA, vital signs, and behaviors are about to be measured and commoditized in the coming decades like never before. Human biosensors are coming out that track every facet of our health (sleep patterns, nutrition, digestion, respiratory, cardio, etc).  Two popular products that already track health stats and sync to the cloud are fitbit and jawbone.

In practice biometric data analysis has the potential to substantially lower the cost of overall treatment (and perhaps save lives). Right now, measuring things like a person’s vital signs or EKG is still somewhat intrusive. Eventually it will be a matter of swallowing a capsule or having a subdermal implant that syncs to a smartphone.

Smart and the changes ahead:

There are social implications as the Smart revolution rolls out. The ‘haves’ and ‘have nots’ of this technology will be in two different and very unequal worlds. Those who have access to this technology will be able to remove variability from their business, maintain their health to a higher degree, and be in greater control of their environment.

A side effect is monopolies may stay in power longer because of the extra edge that comes from wielding the most data and having the best toys. This will create additional barriers to entry for market newcomers.  Maybe this has always been true though?

In 50 years today’s world will look like the dark ages in terms of all the advanced knowledge we will have about our health, our potential, and our interactions as a society. Can you imagine what it was like 50 years ago to go on vacation to another country with just a guide book and paper map? We’ll look back and say, gosh can you image what it was like not to know your blood chemistry stats in real time on your phone? Yes, cholesterol really does spike after eating a burger and fries!

For more reading:

The Human Face of Big Data by Rick Smolan and Jennifer Erwitt has dozens of examples of the application and promise of big data. The book lives up to its name. It is huge, twice the page size of a standard book!

bigdatabook

Milk and Cat image by ‘the bridge‘ on Flickr
Sensor image by Huskeflux on Flickr

 

Posted in Data | Tagged , , | Comments Off on Are you Smart yet? Will everything become Smart someday?

Stripe vs Paypal for Collecting Payments Online

A few months ago I upgraded a client’s website to support Stripe along side the prior Paypal implementation. The result: sales are up slightly and about 60% of customers are choosing to pay with Stripe.

paypal logo     VS.stripe logo

Quick summary:  I’m impressed with Stripe as a developer. Customers obviously prefer Stripe. On the business side, the feedback I got from my client is, if you can put up with some of Stripe’s reporting and accounting limitations it is a clear win because the implementation cost is lower.

The Developer Experience:

Stripe has an easy to use control panel, a great API, and pretty decent documentation. They are by developers for developers and it shows. You basically drop in some JavaScript, and then write a very simple call to their API on the backend to complete the purchase. Your server never sees the credit card number. That makes life easy. Stripe supports REST, PHP, Ruby, Python, Java, iOS, and there are many third party libraries which offer additional language support: https://stripe.com/docs/libraries

Stripe is particularly strong because of their clear distinction between testing mode and live mode. Both modes are built on top of the exact same API. Testing failure codes (like a bad zip code) is so easy. The Stripe testing page has a list of special card numbers which trigger certain error conditions: https://stripe.com/docs/testing

Paypal’s developer experience is clunky and reflects the 10 year old design. Their documentation can be pretty hard to navigate. Paypal has a developer sandbox site that requires a separate set of logins. You have to stay logged into the sandbox site in order to test checking out when using paypal buttons. This makes it annoying to test across multiple browsers. You are also forced to setup sandbox accounts and make sure they maintain a balance. Good luck figuring out how to add money… I wound up just making new accounts. It also appears the dev backend and the live backend are at times running different builds, which scares me.

Paypal has a callback system called IPN (instant payment notification). This is actually fairly simple to work with. I did not work with the Stripe’s callback hooks. However, in reading Stripe’s documentation on the subject it appears to be far superior, especially when setting up recurring payments.

The Experience for the Customer Buying at Your Website:

Stripe’s checkout interface is incredibly streamlined and simple – follow this link for their demo: https://stripe.com/docs/checkout.  Even customer address fields are optional. The credit card collection screen operates as an overlay on your checkout page. Stripe supports just about all major browsers though at the time of this writing Opera is not supported. A Stripe account is not required to make the purchase. Stripe has a nice looking email receipt that goes out automatically.

A lot of customers don’t like Paypal, I tend to agree. Paypal is slow and when using Paypal buttons there are more opportunities to stop the purchase because of the extra screen changes. While customers in the USA can proceed through the checkout and pay without needing a Paypal account, we had numerous complaints from customers outside the USA because Paypal was forcing them to open an account! This is actually what pushed my client over the edge in terms of pulling the trigger on the Stripe integration.

The Business User Experience:

Paypal integrates nicely with the bank, allows payments and invoices to be sent, and overall just works. Paypal has solid reports that make sense to business people.

Stripe wants to credit the bank account every day, creating a flurry of transactions. This can be really annoying, especially if you are used to transferring money from the Paypal account to the company bank account. At the time of this writing Stripe is very thin on reporting capabilities. Since I started drafting this post they did introduce a way of downloading transactions as CSV, but it requires setting up a report filter each time and then clicking download (still as CSV). What my client wanted is a monthly PDF report with a summary total at the bottom. If Stripe emulates Paypal in this respect they will make most people very happy.

In terms of fees, Stripe and Paypal start off the same for small retailers. However, Paypal reduces their fees as your sales volume grows. I wish Stripe did this! Paypal cuts businesses a break after monthly sales hit $3,000, $10,000, and $100,000. Paypal also offers Micropayment fees which are good for companies that have an average transaction under $10. I’d say this is the only way Paypal really wins, but it is a substantial advantage until Stripe can scale out enough to offer volume discounts.

Paypal Merchant Fees:
https://www.paypal.com/us/webapps/mpp/merchant-fees

Stripe Pricing:
https://stripe.com/us/help/pricing

Customer Support:

Paypal customer support experience gets a 1 out of 5 stars. Typical situation with outsourced technical support and a phone tree from hell. I was hold for way too long, tell them the problem with their system (echecks were not clearing), they act confused at first, transfer me to some other department, deny the problem exists, then out of the blue a few days later tell me it is already fixed…

Stripe is good about getting back with qualified people who understand my needs. 4/5 stars here – just wish they responded faster. Stripe has an interesting policy at their company where employee email works more like lists. This creates at atmosphere of transparency: https://stripe.com/blog/email-transparency Sounds good to me!

System Performance:

For the most part, both Paypal and Stripe have had solid up time. Very rarely an email from Paypal gets lost or take several hours. Very rarely Stripe email receipts are 90 minutes behind the actual transaction.

Posted in Business | Tagged , | Comments Off on Stripe vs Paypal for Collecting Payments Online

How to write a proper commit message

No matter what versioning system you are using (Git, Mercurial, Subversion, TFS, CVS), a useful commit message is just as valuable as adhering to coding style and leaving behind useful comments.

Why care about commit messages?

  1. If you ever need to look back through the commit log to find when a change was introduced, you will be reading the commit log. Poorly done commit messages require diffing the files to discover what happened.
  2. In open source projects a lieutenant or BDFL will be reading the commit message, and deciding to go forward based on the content. The commit message serves as documentation but partly as a marketing blurb for you in that case. I have seen pull requests rejected solely on the basis of a nonconforming commit message.

Examples of poor commit messages:

"" - a blank commit message is totally useless, the shame!!!!

"bug fix" - doesn't describe what was fixed or why

"minor updates" - doesn't describe what was done, minor is a relative term

"adds button" - doesn't describe what button was added where or what it does

"typo" -  where was the typo? how to confirm it is fixed?

Recipe for a good commit message:

  1. Include the issue number at the start of the commit message.
  2. Include the module or section of the application.
  3. Describe the outcome from the end user’s perspective, changes to business logic, changes to defaults, behaviors, etc.
  4. Justify the solution. This may not be necessary if you can explain more about the commit in the issue tracking system (and you followed step #1 so the commit and the issue tracker are cross referenced). In open source projects, the commit message might be the only place to justify your approach though.
  5. Some projects are very strict about the length of the commit message’s first line. The standard is 50-60 characters with a max of 72 for the first line of a commit message in Git. That has to do with the fact that Git users, including its founder Linus, are predominantly console users.
  6. Follow the team’s post commit hook standards. For example if the commit starts with “Fixes #123” it might push issue #123 to QA.

 

Examples of good commit messages:

XYZ-1234 Backend support for multiple currencies

Everywhere a currency field is rendered the Money object automatically formats the data properly in the user's currency format. No changes to the UI code. Modules that use currencies include pricing, checkout, shipping estimation. Backend changes fully unit tested.
XYZ-999 Fixes typo on login page

The login page had a link to the forgot password page spelled 'phorget'. HTML level change. No other cases of 'phorget' present in code base.

 

Other tips about committing code:

  • Read your own code and the diff before committing. I cannot tell you how many bugs I’ve caught in my own code by performing this simple task. Sometimes I get up, take a little walk, then come back and look at my code with my QA hat on. It is hard to break down your own masterpiece, but the better you get at it, the more of a true master you will become.  So before committing that next chunk of code, ask yourself:
    • Is this up to snuff quality wise?  Would you want your co-workers to read it in the state it is in?
    • Is this code properly formatted?
    • Are the complex sections adequately commented?
    • What if variable X is the wrong type or null (think of null pointers in Java, string vs numeric in Javascript).
    • What if an unexpected input is passed in?
    • Are all the corner cases in the unit tests covered?
    • Any straggling TODO items that need to be cleaned up?
    • Spelling / grammatical errors anywhere?
  • Never mix a white space or formatting changes with a logic change. That makes it a nightmare for the next person to figure out which changes were relevant to the program’s logic and which changes were just formatting updates.
  • Commit in atomic related units. A commit should not include changes to unrelated sections of the code.
  • Favor smaller commits vs one HUGE commit.
Posted in Code, For New Developers | Tagged , , | Comments Off on How to write a proper commit message

What is a Data Scientist?

The world is now generating zetabytes of data annually and it is only projected to increase. The spread of smart phones, the amount of sophisticated yet cheap sensors, near zero cost of storage, and the amount of investment dollars behind Big Data are pushing demand for data scientists to the forefront.

Software professionals are perfectly positioned to benefit from this trend. Plus, Big Data is a lot of fun to work with!

So, what is a Data Scientist?

The term data science has been around for more than 30 years. Data science has been called a combination of statistics, data munging, and visualization.  It also has to do with hacking, substantive skills, and math/stats (venn diagram). Forbes article: A very short history of data science.

In today’s fast moving world of technology data scientists draw on a combination of skills. A data scientist might be part DBA, part computer scientist and part coder. A solid background in statistics with an understanding of research principles and a critical mind is required. Data scientists may be involved with pattern recognition, data visualization, artificial intelligence, computer vision, and the tools needed to organize and make use of big data (NOSQL, Hadoop, etc).

The good paying jobs for data scientists are in industry, usually large industry (since large industry currently has the means to capture Big Data and pay for its optimization). The expectation is the ‘data scientists’ will uncover findings that generate some economic benefit well beyond their salary.  For example: mapping oil wells, optimizing shipping routes, predicting diseases before they show symptoms.  These ‘data scientists’ understand how the business works, and what matters to the bottom line. I take issue in labeling the profit maximizing role as science, but more on that later.

Software developers who have familiarity with ‘data science’ gain a valuable specialization that could morph into a second career. The neat thing is, the experience acquired by a data scientist is timeless and grows in value over time. Consider in software development, the shelf life of most skills is short given rapidly changing trends.

Person using virtual reality

(Image from the Idaho National Laboratory flickr collection).

 

Data Scientists and the Quest to Maximize Advertising Revenue:

Science is a sacred word and we should not so carelessly dilute the meaning of it.  At what point do we take away the word science, and put in analyst? For more: why the term data science is flawed but useful.

It has been pointed out there are too many ‘data scientists’ focused on trivial problems like maximizing advertising revenue.

Jeff Hammerbacher, former Facebook research scientist said in 2011: "The best minds of my generation are thinking about how to make people click ads." His conclusion: "That sucks."

Can it really be called science if the goal is to maximize advertising revenue for a particular social media website?  Can it be called science if standard deviation, linear regression, r factor, t-tests and the like are given zero credence.  I think not. Correlation is not causation!

Classification of Data Scientists:

To help explain the spectrum of data scientists, I’ve broken it down into the following four broad categories. To me the first two are clearly science, while the latter two are more of a grey area.

  • Theoretical Data Scientists work on the theory of data science and contribute to frameworks and tools other data scientists use.  This is essentially statistics, data storage, and computer science as applied to Big Data on a theoretical level (academics).
  • Applied Data Scientists are out to gain a better understanding of the world using big data.  Since ‘science’ does require rigor I see this grounded in academic rigor, but used in an applied manner. At the outset an applied data scientist’s job is to formulate hypotheses and test them using data.  In a perfect world, everyone benefits from their research findings and tools.
  • Industry Data Scientists use applied data science for a specific market problem, industry, or business for the sole purpose of maximizing profit. Industry Data Scientists must be proficient at communicating their findings to the business, such that it can be easily understood and acted on.  Training or experience in business, economics and accounting as it applies to the business domain is where the value is created. The roles of Business Analyst or Business Intelligence consultant are pretty similar.
  • Advertising Scientists may or may not be trained in data science and apply the craft towards maximizing clicks and optimizing A-B tests. May use pop-sci methods. Maybe we just drop the term scientist, and call these folks Advertising Maximizers?

 

Within the applied and industry categories, I envision two additional types:

The extroverted data scientist:

Data scientists who work with people will be required to write reports read by humans and influence their decisions. Data scientists in this camp will need to excel at communication with normal people.  The only way to get value out of the data is to communicate the findings to the decision makers.

The introverted data scientist:

Picture someone with several screens of data and code open at once, working on some algorithm.  They might look up from their dimly lit desk, pull out their ear buds, and ask their boss – “What field is the user experience stored in?”  

The more introverted data scientists will be working on cleaning up data, building tools, and engineering data feeds used by other systems. The data feeds will pass relevant events up the chain to other systems or human decision makers. Consider bio-metric data, intrusion detection systems, twitter sentiment feeds – all working to make sense out of very noisy data. The feeds must process the data in real time to be valuable, so speed is a factor.

I can’t wait to be able to stand inside my data sets, manipulate them in 3d and fly through them:

Idaho National Laboratory’s new 3-D computer-assisted virtual environment — or CAVE.

Person using virtual reality

(Image from the Idaho National Laboratory flickr collection).

 

Posted in Data | Tagged , , | Comments Off on What is a Data Scientist?

Tools to test your website’s performance

There are many developer oriented tools out there for testing web site performance. Many are free (free as in beer), some are open source (free as in speech), and a few are also subscription based.

If you are not using performance analysis tools in your development process, I highly recommend it. Tools are what take something good and make it great (including you). The main thing these tools focus on is page load time and caching performance. Reducing bandwidth also results in cost savings.  Often times the ‘problems’ these tools uncover are easy to fix.

YSlow:

The original innovator in this area was YSlow which started in 2007, backed by Yahoo!. YSlow is open source and operates as Chrome and Firefox plugins.

yslow

This site gets a B performance grade. I can live with that.  Everything I’m getting an ‘F’ on is either out of my control or overkill for a personal blog.  For example, gzip is now allowed with this hosting plan. The gravatar images don’t have an expires header set (that amazes me).  Also, I’m not going to the trouble of setting up a CDN to serve images for my blog.

I was able to address the Etags and most of the Expires Header warnings by adding this to .htaccess:

#disable etags
FileETag none

#set expires headers
ExpiresActive On
ExpiresByType image/gif "access plus 7 days"
ExpiresByType image/jpg "access plus 7 days"
ExpiresByType image/jpeg "access plus 7 days"
ExpiresByType image/png "access plus 7 days"
ExpiresByType image/x-icon "access plus 7 days"
ExpiresByType image/ico "access plus 7 days"
ExpiresByType application/javascript "access plus 7 days"
ExpiresByType application/x-javascript "access plus 7 days"
ExpiresByType text/javascript "access plus 7 days"
ExpiresByType text/css "access plus 7 days"

 

Alternatives to YSlow:

Keep in mind, YSlow runs from your machine, so the effect of any proxies on your network will throw the results. This is where external cloud based tools come in. The following tools do pretty much the same thing as YSlow, looking at site load time, JavaScript/CSS evaluation, use of CDNs, etc.

Pingdom Site Tools:

http://tools.pingdom.com/fpt/

What I love about this tool is it captures DNS lag time in addition to download and processing time from up to 3 different geographically distant servers. I’m a fan/customer of pingdom’s alert service. There is a checkbox to make the results private.

pingdom

 Other Site Analysis Tools:

  • http://gtmetrix.com -> Very similar to YSlow and Pingtom Tools
  • http://www.webpagetest.org -> More of a network / file size analysis.

  • http://www.showslow.com/ – results are generally public but you can download and run this program on your own server if you want. Very useful if you have lots of websites to check and want to automate that process.

 

Going beyond just page load time:

HTML Validator:

Complete suite:

  • http://www.woorank.com
  • Combines site speed test, ranking, SEO evaluation, social media evaluation, HTML validation. Woo is right, but you only get to run one site evaluation per week for free (bah)…
  • I’m impressed, but it is spendy.

Other tools I have bookmarked or use on a regular basis:

Posted in For New Developers, Work | Tagged , | Comments Off on Tools to test your website’s performance