Sending emails through Comcast on Ubuntu using ssmtp

Ssmpt is a light weight mail package that is easy to configure and suitable for my needs during local development. It is basically a mail forwarder, can’t receive email, and has very few settings relative to a program like sendmail.

Comcast is notorious for requiring email sent on its network to go through its smtp server. Not doing that can get your IP blacklisted and your legitimate emails flagged as spam. I resisted but was assimilated. These settings should work for most ISPs, not just Comcast.

Install ssmtp:

sudo apt-get install ssmtp

Configure ssmpt for Comcast:

You must setup an account with our ISP / email provider and enter the email/password below. I use a dedicated email account for development.

sudo vi /etc/ssmtp/ssmtp.conf

ssmpt.conf content:

root=postmaster
mailhub=smtp.comcast.net:587
UseSTARTTLS=YES
UseTLS=YES
AuthUser=myaccount@comcast.net
AuthPass=****
hostname=mymachine
FromLineOverride=YES

To test it out:

First save a test message in the ssmtp format, here is how my file looks:

$ cat testmessage.txt
To: youremail@gmail.com
From: you@comcast.net
Subject: test message

Test message for ssmtp.

To send the message:

ssmtp youremail@gmail.com < testmessage.txt

For PHP compatibility:

Edit php.ini, look for the sendmail section, set the following:

sendmail_path = /usr/sbin/ssmtp -t

Last step: restart apache

Posted in Sys Admin | Tagged , | Comments Off

A KeePass setting that might save your online identity

Your KeePass file might not be as safe as you think, but it is easy to protect yourself with this simple settings change that does not require creating a new kdbx file. This helps make your KeePass file more secure by deterring dictionary and brute force attacks.

The setting is called ‘Key Transformation’, accessible in KeePass under File ? Database Settings… ? Security. This screenshot is of version 2.x, but 1.x also has this feature (minus the helpful one second delay button).

KeePass Transform Key Settting

What it does is run the master key through N rounds of encryption before applying it. The higher the N, the more time it takes your CPU to process through all the rounds of encryption. The default is 6000 which takes less than a millisecond for a modern CPU to churn through. My setting is in the high 7 figures, and takes about one second. That is a delay I can live with each time I attempt to open my KeePass file. In fact it kind of feels good to be reminded the program is doing extra work to protect me.

The reason for introducing a delay is to slow down a brute force attack to the point it is unfeasible in this lifetime. A brute force attack starts by trying every character (A-Z, a-z, 0-9, symbols), then every two character combination (aa, ab, ac…), then every three character combination (aaa, aab, aac), and so on. A related approach, called a dictionary attack, loops through a dictionary and tries all words and various combinations of words with different delimiters. Eventually these approaches will find the master password. However, when N is a high enough number, it will cost the attacker one second per attack (per CPU), which is a serious roadblock.

If your password is sufficiently strong, say 30 random characters including A-Z, a-z, 0-9, and 10 different possible symbols, that is 72 characters to draw from. That results in 72^30 = 5.2477712e+55 possible combinations, over half a googol! Only an attacker with a huge number of CPUs or a huge amount of time would be able to check all combinations. I doubt this little technique would deter high level national security organizations with billions of dollars in funding. However, I have a strong sense that a high N would deter script kiddies and cracking programs.

As CPUs get faster, N needs to increase to offset the time it takes to attempt a single crack at the master password. I plan to increase the value every time I get a new machine.

What the ‘average’ user sets their password to:

You know it really isn’t very hard to achieve ‘better than average’ password security. Most people use the password ‘password’ or ‘123456’, and tend to use the same password for all their accounts.

Going beyond just a strong password:

A full proof password may not be enough. Wired did a thorough write up on how a weak password and social engineering combined with a basic flaw in processes at Amazon and Apple lead to journalist losing his entire online identity.  That is why I always setup the extra identity verification questions under my account. I never use the same Q&A twice. I also use three different emails: personal, work, and private / banking. That way even in the worst case scenario where a hacker is able to trigger password resets and get into accounts the scope of the damage is limited.

What is KeePass?

For those who don’t know, KeePass is a FOSS program for managing passwords. One ‘master’ password gets you into all your other passwords. It can easily generate strong passwords. In fact, I don’t even know some of my passwords since they were generated inside KeePass with the ***’s showing. From there I pasted the value it made into whatever website’s sign-in form I was at. I then immediately make a secure backup of the KeePass file so I don’t lose that new password. The coolest thing is the Ctl+V feature that will tab back to the previous window, paste your username, tab, paste your password, and then hit enter to submit the form.

I’ve been using KeePass to manage my passwords for almost a decade. What I really like about it is how portable it is between Linux, Mac, and Windows. It also has ports to all manners of tablets and smart phones – but I would never put such a sensitive file on something that doesn’t have an encrypted drive.

Is KeePass secure?

I have not read the source and can’t vouch for it. I just know a lot of other software professionals who also use it. The fact that it is open source makes me feel better about it. It does encourage temporarily putting passwords into the system clipboard, which is arguably an insecure spot. Typing a complex password has its downsides too a) it takes time, and b) keystroke listeners would be able to pick them up.

Here is an interesting article about someone who was tasked with cracking a KeePass file. The article doesn’t say how they cracked it, but the YouTube video comments say they “found it written on a piece of paper.”

LOL!

So the moral is, KeePass is as insecure as its operator is careless.

Posted in Business, Work | Tagged , | Comments Off

Why I use GitHub (or Bitbucket) at every chance, and why you should too

When I work on projects that don’t have GitHub or Bitbucket, I really miss it. It is the little things they do that speed things along and get me access to what I need in a way that looks visually pleasing.

github

bitbucket

This is not meant to offend, but for me GitHub and Bitbucket are pretty much the same thing. BitBucket originally attracted me due to its free private repos. All the work I do is under NDA, meaning it is confidential. The code is usually owned by whoever I’m working for, so privacy really matters. In the course of my work I’ve used both GitHub and Bitbucket extensively. For my purposes I really can’t distinguish between them. Others have tried recently. It seems to come down to nuances between open source vs enterprise development. That aside, I’ll just call the pair GitHub from now on so I don’t have to repeat myself.

5-speed manual vs automatic:

The difference between a project with and without GitHub, is sort of like owning a car with a automatic transmission vs a 5 speed manual.

I used to own an old BMW 3 series with a 5-speed (technically an E30). It had 3 floor pedals, the extra being the clutch for shifting gears. That car was a blast to drive! It had a tachometer in the dash too. I remember always being impressed that in 5th gear the speedometer and the tachometer were parallel. Pretty cool design and engineering philosophy by BMW. I just loved the way it responded, even though it had 180k miles when I bought it. Yeah it was expensive to maintain, but I was infatuated.

Sadly, this is less common today, but I also learned to drive on a manual. Just after my sixteenth birthday I took my drivers test with a 5-speed Corolla. During the test I conked it out twice but still passed by one point.

That is the good and bad about the manual: it is more work, can be slower to shift and fatiguing to drive, but in the right hands, when you down shift and punch it out of a corner there is nothing like it! It does just what you expect it to do at all times.

The enjoyment of shifting gears:

When it comes source control, git command line is my sporty 5-speed manual. I use git exclusively on the command line. I know my limits (by no means am I a git guru), but I get the job done day in and day out. It brings satisfaction in the familiar routine of going through the gears (pull, commit, push and the occasional merge/rebase). Everybody I know who can switch to git already has.

Sorry Subversion:

I suppose SVN is now the equivalent of an old rust bucket with a 3-speed on the column without a synchro (double clutch to get back into first). Sorry SVN, you were a trusty pal back in the day.

The ease of driving an automatic:

Using GitHub on top of git is what I consider an ‘automatic’. It does a lot of nice stuff intuitively that I don’t have to work at or think about to much.

My main use of GitHub is the web interface for browsing the repo. I love being able to compare branches, look at commits, study code, go back in time, make inline comments, etc, etc. The coloring of the output is very clear as to what is new code, what was removed, and which lines were changed. I will often have a handful of GitHub tabs open at once to get caught up on recent commits. Reading code recently committed by your team members is a good habit, even if not required by management.

To that point, fixing a bug correctly (without breaking something else) almost always involves determining its origins. With GitHub it is very handy to be able to literally ‘click’ into the past and search for keywords at a certain point, and then correlate those changes to commit messages. Then you know who to take the nerf bat to.

I have tried desktop GUI tools on Ubuntu and Windows for browsing repo history. They all come up way short and remind me of Windows 3.1 programs. The command line can be used for looking at recent changes and even code archaeology, but in practice it becomes too much to wade through.

Managing pull requests in GitHub is really nice too. It will even warn you if there is a merge conflict in advance. The built in wiki’s are nice. The Readme.md markdown formatting is nice.

A project run through GitHub (or BitBucket) makes my work day easier, makes collaboration easier, and helps me feel like I’m right there with the rest of the team when I’m working remotely.

With git command line and GitHub, we get the best of both worlds. The pleasure of the 5-speed (git cli), and the convenience of the automatic (GitHub). Okay it’s not a perfect analogy…

Some alternatives for the DIY project:

Don’t want to tie yourself to GitHub or BitBucket? I don’t blame you. There are many business cases for keeping code on servers you and only you control.

These projects are web based repo browsers that work similar to GitHub:

Posted in Code, Work | Tagged , | Comments Off

The Software Maintenance Efficiency Curve

I have been told “there is no such thing as green field development”. While that statement is false for the hobbyist developer, in the business world it is nearly true. Those who code for a hobby or for pure enjoyment often start from scratch, as evidenced by the explosion of unmaintained projects on Github. See my article about software ghettos for more on that. When it comes to software used in the real world, open source or not, maintenance is an everyday task.

Consider what goes on between the 1.0 and 1.1 release. Was that 100% new work or did it include some maintenance to allow the 1.1 features to fit with the 1.0 architecture? Now fast forward to the 1.8 release, was the ratio of maintenance higher? Almost certainly.

An article by Robert Glass in IEEE Software May/June 2001 called Frequently Forgotten Fundamental Facts about Software Engineering states maintenance is 40-80% of software cost, and enhancements contribute to 60% of new maintenance costs!

Why care about quality?

Consider that businesses are not interested in (and probably can’t afford) a monument to computer science. What the average business demands is functional code. I have been involved with dozens of businesses, small, large, tech centric, and technophobic – none have asked for fancy or perfect code. Anything beyond functional is seen as a waste, and I agree. This is not a license to take shortcuts and hack things together. If shown the distinction a business doesn’t want a ghetto code base with anti-patterns everywhere that will soon become unmaintainable and cause developers to run and hide. In spite of this, it turns out a lot of systems are managed in a manner that contributes to major system outages, security holes, developer attrition, and occasionally huge monetary losses. Google ‘stock market glitch‘ for examples.

How can software maintenance work be done efficiently?

A great developer won’t make much of a dent if they are blocked from doing so. The product owner should have a long term plan for the system which includes keeping the system healthy and maintainable. That plan should favor fixing existing bugs (see #5 on Joel’s list) and allocate time for paying down technical debt in each release. Technologies such as source control, a suite of unit tests, code linting and build automation are extremely helpful. Policies on code style, documentation, learning, and knowledge sharing make a big difference too.

A team composed of a mix of veterans, mid level staff, and junior developers makes for a healthy balance. The developers should be allowed to think they own it (a variation on a famous quote from Bill Gates). A culture of knowledge sharing should be encouraged and rewarded. Assumption checking should be considered normal and non-threatening. Have you ever read a spec that was 100% free of half baked assumptions? Individual performance should take a back seat to team performance. Otherwise silos form, the incentives become twisted, and so does the code.

On the individual level a developer has three hills to climb to become maximally efficient:
1) The languages, libraries, and technologies used in the system.
2) The domain (the nature of the business).
3) The way the system was setup.

Languages and libraries should be a relatively low hurdle if the technologies used are ubiquitous and the right skills are hired for. Domain knowledge is harder to come by. In some areas such as insurance, finance, education, or ERP a person with the right experience is attainable. The third hurdle is by far the least visible to the business and the most challenging. It ultimately comes down to what is stuck inside the developer’s head that makes them efficient at maintaining the system. If the developer wrote the system from scratch, they get past that for free. That assumes they haven’t already moved on… perhaps washing their hands of a mess?

“Debugging is like farting – it’s not so bad when it’s your own code” – Unknown

The time it takes to attain mastery over a code base is proportional to its size and complexity. The best approach is to start with an easy task, then something slightly more complex in a different part of the system, then something in a third area, and finally circling back to the first area for a real challenge. This way confidence is built up steadily and the risk of breaking something critical is reduced.

The first few days to several months of working on an unknown system are the most stressful and error prone for a developer. Without knowing every aspect of the system it is easy to accidentally write new bugs. Without a senior developer or product manager to explain things it can be very confusing and frustrating to make headway. This is where developers with solid people skills and high self esteem will shine because they are not afraid to ask for help and are effective at getting good answers.

Development efficiency increases over time then plateaus:

Software Maintenance Efficiency Curve

The orientation phase and steepness of the growth phase increase relative to the size of the system. They can be shortened with documentation, clean code, but most importantly friendly and knowledgeable team members. Hiring for a person with knowledge of the languages, libraries, and domain also helps.

Let’s say things go well, and the developer climbs the efficiency curve after X days or weeks. Now they are really ‘making money’ for the business. This is the most efficient place for the developer to be business wise. The length of time a developer spends on top of the curve depends entirely on the company’s ability to retain that developer. The going advice is to pay at least a fair wage, be flexible, be organized, then stand back and let them go. Make sure to let them do interesting things from time to time. Offices with windows, sit stand desks, and flexible hours are nice perks that don’t cost much when averaged out. The alternative is to loose the developer and go back to square one in the orientation phase with someone new.

Posted in Application Development, Business | Tagged , , , | Comments Off

How to setup the MySQL data directory to be in your encrypted home folder on Ubuntu 14.04

Ubuntu has built in home folder encryption similar to OSX. I always turn on this feature on both OSs and have never experienced any perceptible performance hit. This guide shows one approach to migrating the MySQL data directory into the encrypted home folder on Ubuntu 14.04.

Caveats:

The only system user allowed to access the encrypted home folder is the user that owns that folder (eg your user). For this approach to work, MySQL must run under the same user that you login as. The service must be started after you login to the desktop. That can be automated by creating a script that gets triggered by the ‘Startup Applications’ program.

Configuration changes:

# stop mysql
$ sudo service mysql stop

# backup mysql data folder and config file
$ sudo cp /var/lib/mysql /var/lib/mysql_backup
$ sudo cp /etc/mysql/my.cnf /etc/mysql/my.cnf_backup

# move mysql data folder
$ sudo mv /var/lib/mysql /home/youruser/mysql

# change ownership of folder
$ sudo chown -R youruser /home/youruser/mysql

# config changes to my.cnf
$ sudo vi /etc/mysql/my.cnf

Changes to my.cnf:

  • socket = /home/youruser/mysql/mysqld.sock (there will be multiples)
  • pid-file = /home/youruser/mysql/mysql.pid
  • user = youruser
  • datadir = /home/youruser/mysql
  • log_error = /home/youruser/mysql/mysql_error.log
# start mysql
$ sudo service mysql start

# test everything out...

# when you are sure it is working
$ sudo rm -rf /var/lib/mysql_backup

Why encrypt the MySQL data directory?

Computer equipment, particularly laptops, are stolen all the time. As a developer, your machine probably contains dozens of sensitive passwords, api keys, ssh keys and so forth. Most are probably dev accounts, but a few live passwords might be floating around too. For this reason I keep all my files in the encrypted home folder (as it is meant to be).

A potentially huge source of sensitive information are local databases on your machine. The degree to which a dev database should be locked down really depends on the nature of the business. Talk to your manager about it if you are unsure.

What I like about this solution is, since the entire data folder is encrypted, it works going forward automatically for any new databases. This technique is not unique to MySQL, all database platforms allow storing data in a user defined location.

Is Ubuntu’s encryption of the home folder bullet proof?

See the following links for more information:
http://www.linux-mag.com/id/7568/
http://security.stackexchange.com/questions/41368/is-encrpyting-home-sufficient
https://help.ubuntu.com/community/EncryptedHome

Nothing is likely to stop serious hackers or the NSA. However, putting sensitive data into the encrypted home folder is a reasonable precaution a professional should be expected take.

Saying –

“My laptop was stolen which contained all customer email addresses… *sorry*.”

Sounds MUCH worse than  –

“My laptop was stolen and the data was encrypted with AES 128-bit encryption making it very very unlikely anybody, including computer experts, small nation states and powerful corporations will be able to access anything.”

 

What about using a cloud database for development?

Hosting your dev database in the cloud keeps sensitive data off your machine. This option is becoming increasingly affordable. Depending on latency to the cloud it can slow down day to day development work. If you do use cloud servers for development, make sure to connect over an encrypted connection! Otherwise everything that goes back and forth can be eavesdropped on. A VPN, SSH Tunnel, or MySQL SSL connection will do the trick.

Posted in Sys Admin, Work | Tagged , , , , | Comments Off

Correct use of PHP’s ‘at’ operator with speed benchmark

In PHP placing an @ symbol in front of an expression (variable or function call) tells php to suppress any error messages that expression generates. I find this to be a handy piece of syntactic sugar. When used correctly the gains in code readability far outweigh the costs in terms of performance (which I benchmark below). Some people argue that suppressing errors is a mistake and can mask problems so therefore this technique should never be used. I agree with the idea that suppressing errors is bad. At the same time if I don’t care if something in a 4 level nested array is null, then suppressing PHP’s chatter is doing me a huge favor.

Let’s look at an example of where the @-operator shines. Consider trying to get a value out of a nested array, which may or may not be set such as $response[‘STATUS’][‘ERRORS’][‘ERROR_COUNT’], which is a typical thing to see in SOAP based XML responses from enterprisey APIs.

One approach might be:

if(isset($response) &&
   isset($response['STATUS']) && 
   isset($response['STATUS']['ERRORS']) && 
   isset($response['STATUS']['ERRORS']['ERROR_COUNT'])) {
	$error_count = $response['STATUS']['ERRORS']['ERROR_COUNT'];
}

Although isset() doesn’t have a problem with this shorter version either. Thank you to my friend for pointing this out!

if(isset($response['STATUS']['ERRORS']['ERROR_COUNT'])) {
	$error_count = $response['STATUS']['ERRORS']['ERROR_COUNT'];
}

With the @-operator:

$error_count = @$response['STATUS']['ERRORS']['ERROR_COUNT'];

I like the last method because it is cleanest. I don’t care if $error_count is zero or null. The @-operator, being a somewhat lazy technique pairs well with another of PHP’s lazy at best but deeply flawed at worst ‘features’ in that NULL, “0”, 0, array(), and false are ‘falsey’ and can be used interchangeably when doing comparisons with plain ‘==’. By using three equal signs ‘===’ the types of the variables are also considered and that is generally the preferred method of comparing things, but that level of precision isn’t always required.

Notes about the @ sign in PHP:

  • If you delcared a custom error handler with set_error_handler() that will still get called.
  • It only works on expressions (things that give back a value). So it does not work on if/then statements, loops, and class structures, etc. This was a wise choice by the PHP community.
  • The fact that it only works on expressions greatly reduces the unanticipated side effects that can result. In this sense it is nothing like ON ERROR RESUME NEXT, an infamous language feature in Visual Basic and Classic ASP, which chugs past errors. The previous error can still be checked for in a sort of poor man’s try/catch block. ON ERROR RESUME NEXT sucks and makes me want to hurl just thinking about it.

Some people really hate the @-operator:

Most of the arguments against the @-operator come down to misuse and then over reaction. The fact is inexperienced and inept programmers can take any language feature and come back with a hairball of unmaintainable code.

As I demonstrated above, the @-operator is great when digging through arrays such as complex DOM objects. This is especially true with optional keys. It should not be used when calling external resources like the file system, database, APIs, etc. In those situations, try/catch blocks should be used to make sure if something goes wrong it gets logged and cleaned up properly. The @-operator is not a substitute for a try/catch!

The second major knock against the @-operator is the alleged performance penalty. Let’s do some benchmarking:

laurence@blog $ php -v
PHP 5.3.24 (cli) (built: Apr 10 2013 18:38:43)
Copyright (c) 1997-2013 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2013 Zend Technologies

laurence@blog $ cat php-at-operator-test.php
<?php
error_reporting(E_ALL ^ E_NOTICE);

$OPERATIONS = 100000;

// test using @-operator
$time_start = microtime(true);
for($i=0; $i<$OPERATIONS; $i++) {
  $error_count = @$response['STATUS']['ERRORS']['ERROR_COUNT'];
}
$duration = (microtime(true) - $time_start);

echo "With the @-operator:" . PHP_EOL;
echo "\tTotal time:\t\t" . $duration . PHP_EOL;
echo "\tTime per operation:\t" . number_format($duration / $OPERATIONS, 10) . PHP_EOL;
echo PHP_EOL;


// test using isset()
$time_start = microtime(true);
for($i=0; $i<$OPERATIONS; $i++) {
        if(isset($response['STATUS']['ERRORS']['ERROR_COUNT'])) {
             $error_count = $response['STATUS']['ERRORS']['ERROR_COUNT'];
        }
}
$duration = (microtime(true) - $time_start);

echo "Using isset():" . PHP_EOL;
echo "\tTotal time:\t\t" . $duration . PHP_EOL;
echo "\tTime per operation:\t" . number_format($duration / $OPERATIONS, 10) . PHP_EOL;
echo PHP_EOL;
laurence@blog $ php php-at-operator-test.php
With the @-operator:
        Total time:             0.19701099395752
        Time per operation:     0.0000019701

Using isset():
        Total time:             0.015001058578491
        Time per operation:     0.0000001500

For my limited testing with PHP 5.3.24 on a 6 core box looks like the @-operator is ~13 times slower than using isset(). That sounds like a lot, but let's look at the penalty per use, which is 0.0000018201 seconds, or ~1.82 microseconds. An application could do approximately 550 @-operator uses, and it would impact the response time by just 1 millisecond. If a single page request does 550 @-operator look-ups and every millisecond counts then you have a problem. Probably what matters more is overall memory consumption, transactionality, caching, code cleanliness, ease of maintainability, logging, unit tests, having customers, etc... Still it is good to have a solid measure when arguing the case either way. In the future as CPUs get faster and cheaper, I expect the performance penalty to shrink.

Posted in Code | Tagged , | 3 Comments

Flash Boys by Michael Lewis

For anyone interested in code, networking, and finance, Flash Boys is a real page turner. For me personally, with interests in all three, it sent chills up my spine. I could not put it down!!!

Flash Boys is a fascinating, informative, and thoroughly done edge of your seat ride through the modern world of technology driven high frequency trading. I really enjoy Michael Lewis’ works and this is the best so far. This being his latest work, published March 2014.

Flash-boys-jkt_1

The Back Story:

Since the mid 1980’s trading has been increasingly handled by computers instead of people. Starting around 2007 there was a huge disruption in the way stocks are traded on exchanges. It was precipitated by new SEC rules and advances in fiber optic networking. This lead to a surge in trading volume, all of it automated by computer. Clever ‘high frequency’ traders figured out how to exploit the slower players (everyone else) based on network latency and order manipulation. In a fine example of capitalism’s creative destruction, the high frequency trading firms were able to exploit a weakness in the market and shave billions in profit. It is unfortunate for investors that the new players in the market did not correct the inefficiency, but instead used it for exploitation and in some senses made the problem even worse.

I don’t want to give away any of the story because it was such an enjoyable read. You won’t be disappointed by the way tech talk is presented. Some of the heroes in this true story are developers and systems administrators. Go nerds!

Lessons for all Software Professionals:

One issue the book brings to light is the economic consequences of ignoring software quality and long term vision when it comes to system maintenance. Many of the trading platforms and exchanges out there were not written to cope with the complexity and speed of today’s world. This may not come as a surprise, but non-technical wall street managers are driven by short term personal gain in the form of fat bonuses. As such they end up with core systems that are done completely piecemeal, each feature bolted onto the next. Sound familiar?

The piecemeal approach to building software is commonly found in any non-technology company that uses technology. All companies in today’s world are forced to use technology to stay competitive, but few are good at managing that technology for the long term. On wall street it has become a systemic problem and is to blame for what are becoming consistent ‘system glitches’ that send markets spiraling for ‘inexplicable reasons’. The Knight Capital ‘glitch’ that lost $440 million is a great example. NASDAQ and other exchanges routinely have serious flaws that are now looked at as the cost of doing business. It is SCARY to think how much money flows through these systems each day. The planet’s economic security depends on these systems. Technology is easy to blame (especially for managers who don’t understand it). What is actually to blame is the way in which the technology is being managed. The book goes into this issue in detail from multiple view points and I was refreshed to see it brought up.

Hope you enjoy reading Flash Boys!

Posted in Book Reviews | Tagged , | Comments Off

Ever heard of inodes? You need lots of them.

Ran into a situation on a customer’s CentOS server the other day where a service wasn’t working. Symptoms and error messages indicated the disk was full. However ‘$ df -h’ was showing ample free space. What the heck? Turned out the maximum number of files on the disk had been consumed. Technically speaking, the limiting factor was the number of inodes allocated to the volume. An inode is taken up for each file, directory and link on the file system. Inodes act like a database for the files on a file system and contain pointers to the actual information.

When a partition is created the maximum number of inodes is established, rather set in stone. There is no way to re-partition the number of inodes on the fly. In this particular case the volume was 75GB with 23GB free, but only 1,000,000 inodes were allocated to it. The temporary solution was to remove old files that were not needed to get the total number of files on the partition safely back below 1M. As soon as that was taken care of the system started working again.

Unix/Linux (and Mac of course) have the inode concept built into their file systems. To check out the inode status run ‘$ df -i’ to make sure you are not at risk of running out of those precious inodes.

user@host.com [~]# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda             49152000 8771724 40380276   18% /

inode related commands:

‘$ ls -i’ it will output the inode ids for each file / directory.

user@host.com [~]# ls -i1
 1725516 access-logs@
 1721190 backups/
 1720340 dead.letter
 1720652 etc/
 2173459 logs/
 1720654 mail/
 1720648 public_html/
41845314 python@
 1729306 ssl/
 1720653 tmp/
 1720660 www@

The stat command will tell more details about the particular file / inode.

user@host.com [~]# stat public_html
  File: `public_html'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 800h/2048d      Inode: 1720648     Links: 13
Access: (0750/drwxr-x---)  Uid: ( 1058/user)   Gid: (   99/  nobody)
Access: 2011-12-04 16:29:56.000000000 -0500
Modify: 2014-04-20 03:19:04.000000000 -0400
Change: 2014-05-17 00:00:11.000000000 -0400

To get a count of the inodes per folder under the current directory:

user@host.com [~]# find . -type f -printf "%h\n" | cut -d/ -f-2 | sort | uniq -c | sort -rn
   5789 ./public_html
    557 ./mail
    555 ./tmp
    205 ./logs
     75 ./.cpanel
     43 ./etc
     25 .
     13 ./.sqmaildata
     10 ./.fontconfig
      6 ./.subversion
      6 ./.gnupg
      6 ./.fantasticodata
      5 ./.htpasswds
      3 ./backups
      2 ./.emacs.d
      1 ./.ssh
      1 ./public_ftp
      1 ./.cpan

This can take forever so you may want to direct the output to a file (assuming you can spare an inode):

user@host.com [~]# find . -type f -printf "%h\n" | cut -d/ -f-2 | sort | uniq -c | sort -rn > inode_count.txt

For more information:
http://www.linux.org/threads/intro-to-inodes.4130/

Posted in Sys Admin | Tagged , , | Comments Off

Gist CSS for WordPress That Looks Better

The following Gist is the CSS I’m using for my wordpress blog to improve the way Gists look. I found the default Gist CSS to render too large and unwieldy.

The improved CSS sets the maximum height of the gist to 500 pixels. It also reduces the font size and line height so it is more compact. Inspired by: https://gist.github.com/wataru420/2048287

Hope it helps!

Update 10/9/2014 – github must have changed their CSS. Removing the .gist div, line-height entry fixed it.

Posted in Code | Tagged | Comments Off

Grunt – for automating builds in Front End land

Grunt is a front end build tool I’ve used on the last several projects. It handles CSS / JavaScript minification, concatenation, and linting really well.  Some of my legacy projects use a combination of bash and Yahoo UI Compressor, which I’m now switching away from in favor of Grunt.

grunt
What I liked about Grunt from the start is, it is 100% command line based!  Never seen a front end tool that lives on the command line before. That alone got me excited, but it gets better. Grunt is versatile given its plugin architecture. There are over 2750 Grunt plugins at the time of this writing. For example, Grunt can be used to run unit tests, setup as a ‘watch’ to automatically build SASS while developing, and even run PHP, Ruby and Python tasks.

Grunt runs on Node.

Grunt depends on node and npm (node package manager). It is very simple to get started.

$ npm install -g grunt-cli

Then you drop a Gruntfile.js into the root of your project and start configuring.

Here is a sample Grunt  script.

This script combines the web app’s JavaScript and CSS files into production ready files. This is in accordance with the YSlow recommendations for limiting the number of .js and .css files a web application downloads the first time it loads. It also has a task for running jslint, which checks the JavaScript I wrote for obvious problems and stylistic errors.

To kick it off:

$ grunt minify

Results in the built JavaScript and CSS files in the /build/ folder.

To run the lint task (powered by jshint in this case):

$ grunt lint

If opening an extra terminal window gets annoying, there is a plugin available for Sublime Text: sublime-grunt.

For those of you coming from the Java world:

Grunt works a lot like Ant. It does the same things in terms of automating the build process, compilation (well in this case minification), cleaning the build folder, and running unit tests.

There is a companion tool called Bower which reminds me of Maven in the way it resolves dependencies. A second companion tool called Yeoman works similar to Maven archetypes in that it provides pre-built projects with the scaffolding setup.

The trifecta – Grunt, Yoeman, and Bower:

Grunt by itself is just a build system, but combined with Yeoman ‘yo’ for short, and Bower it gets a lot more powerful.  Descriptions of each from the Yeoman website:

  • yo scaffolds out a new application, writing your Grunt configuration and pulling in relevant Grunt tasks and Bower dependencies that you might need for your build.”
  • Grunt is used to build, preview and test your project, thanks to help from tasks curated by the Yeoman team and grunt-contrib.”
  • Bower is used for dependency management, so that you no longer have to manually download and manage your scripts.”


Other thoughts:

At the moment, NPM is a bit like the wild west meets woodstock. The progressive free love that is the npm echo system continues to crank out new packages and interwoven dependencies at a staggering rate. No one person or company is in control of the endless supply of new packages and plugins that are available. That makes it great. It also makes it unstable and insecure.  See my post on Software Ghettos for some thoughts on using open source projects of all shapes and sizes as dependencies.

On rare occasions it is frustrating when something goes wrong with Grunt. If you are lucky it is due to a version mismatch in the local environment and ‘$ npm cache clean‘ might fix it. The error messages can be vague and misleading. I have ran into situations where a fix was available but not ported into the main npm tree or even merged into the plugin’s repo. In these cases I had to override the version manually or do some other hacky fix to get going again.  I have also noticed subtle differences between Windows / Mac / Ubuntu in the way the CSS / SASS related plugins operate. In these cases I deferred to building on Mac. (I really should have documented the issue and made a blog post about it. I wrote it off at the time as a fluke so take that last observation with a grain of salt.)

All in all Grunt is a great tool.  I use it, my life is better, my clients benefit, and releases proceed as planned.

 

Posted in Application Development, Code | Tagged , , , , | Comments Off