Converting from Unix epoch to SAS datetime

I recently had a need to convert from a long value stored as the offset from the Unix epoch (milliseconds) to a SAS datetime value. This is occuring more and more dealing with NoSQL and other datasets. This wasn’t a very easy thing to find, but it was easy enough to do. I’m sharing here in case anybody else might find it useful.

The code below:

  1. Reads a dataset with the numeric milliseconds value,
  2. Divides by 1000 (SAS is seconds, timestamps are milliseconds),
  3. Uses the INTNX function to add 10 years (SAS datetimes are based on 1960, not 1970) in days (between 1960 and 1970 there are 3 leap days. Using “dtyear” for “10” will only give you 2 leap days),
  4. And finally, the GMTOFF function to localize the SAS datetime value for the current system (Unix epoch values are always computed from GMT, SAS seconds are based from the current offset).

data converted(drop=millis);
set source;
format sas_dt datetime19.;
sas_dt = intnx('dtday', millis/1000, 365*10+3, 's') + gmtoff();

And for those of you who are generating content in SAS for other systems to consume, here is the function performed in reverse (generates millisecond values from the SAS datetime):

data output(drop=sas_dt);
set source;
format millis best15.;
millis = intnx('dtday', sas_dt - gmtoff(), -365*10+3, 's') * 1000;

Augment your local backups with S3 & Glacier

I’ve overtly deleted some things I probably shouldn’t have. But, I haven’t “lost” a single bit accidentally since the 10th grade. What I did lose was pretty traumatic.

Since then, I’ve:

  1. Kept multiple drive copies in multiple locations.
  2. Acquired fireproof & waterproof safes. Big ones.
  3. Used OS backup solutions like Time Machine, Carbonite & rsync.
  4. Enabled Dropbox, Google Drive or OneDrive syncing from various mobile devices.

The commercial solutions have plenty of support & documentation available online. The rsync method I use is largely based on this excellent post I had found:

I haven’t revisited these setups in quite some time. However, recently, I did a little directory clean-up and investigation. I didn’t realize what would happen next. Throughout my process, seemingly simple moves of some video & picture files kicked off I/O storms across my systems downstream. The local network activity wasn’t so bad. Carbonite, in particular, decided it wanted to upload pretty much everything again.

This activity got me thinking about what some of the social applications on mobile devices actually *do* when you share a picture or video. You aren’t sharing the original, raw content. My computers were. Multiple machines had NEF and AVCHD content. They were doing this when the usages in those locations could easily be served by jpeg or mp4 renditions instead. I was sharing & providing redundancy at the same time; perhaps too much of the latter.

As the title suggests, my updated process involves S3 & Glacier. I also have a transcoding step for converting images and videos to smaller formats more appropriate to copy and share locally.

The following Amazon blog article outlines the mechanics of creating S3 buckets which can index to Glacier:

Now, when new media is downloaded from phones, DSLRs or video cameras, it undergoes a transcoding step.

Example video transcoding script:

import os
from subprocess import call

BASE_DIR = '/home/cuppett/Videos'
AVCONV = '/usr/bin/avconv'

for (dirpath, dirnames, filenames) in os.walk(BASE_DIR):
  for filename in filenames:
    if filename.lower().endswith('.mts') and filename[:-4] + '.mp4' not in filenames:
      print 'Processing: ' + dirpath + os.sep + filename
      call([AVCONV, "-i", dirpath + os.sep + filename, "-vcodec", "h264", 
        "-acodec", "mp3", dirpath + os.sep + filename[:-3] + "mp4"])

Example rsync script with S3 video upload:


# Decrement image backups
rm -fr /data/cuppett/backup/backup.2
mv /data/cuppett/backup/backup.1 /data/cuppett/backup/backup.2
mv /data/cuppett/backup/backup.0 /data/cuppett/backup/backup.1

# Copy all files except transcoded MP4 to external storage
rsync -aAv --stats --safe-links --exclude='.*' --exclude '*.mp4' --delete 
  --link-dest=/data/cuppett/backup/backup.1 /home/cuppett/ 

# Synchronize AVCHD to S3
s3cmd sync --exclude='*.mp4' /home/cuppett/Videos/ 

The above ensures I have a local backup of the original AVCHD files as well as videos uploaded to S3 (archived in Glacier with a zero day setting). A similar workflow exists for photos.

To me, photos and videos are the most precious, yet require very little security on my part. But, I guess that could largely depend on what kind of photos and videos you take. :)

Personal files (financial, legal, etc) are handled a differently. The same types of workflows can be applied; however, they lend themselves more readily to utilizing a compression and/or encryption step as well. My personal files are both compressed and encrypted before uploading to my S3 backup bucket.

Source code isn’t a problem for everybody. For the files my livelihood depends on, my employer provides a great deal of protection and likely goes to lengths I’m not aware of to protect things. For private projects and scripts, I use Carbonite and GitHub to ensure I have recovery and redundancy. I’m always intrigued when something goes wrong on a large opensource project by the glimpse we get into their recovery techniques and ability (or not) to get back to a last-known-good state.

By doing the steps outlined above,

  1. I have been able to downgrade my Carbonite plan. This no longer provides the primary cloud portion. The difference between packages and the new Glacier footprint fee was definitely in my favor. However, the convenience of the instant restore & continuous backup for current files from this service is preserved.
  2. I have freed significant space up on my laptop. While I have a large (512GB) drive there, it was approaching capacity with the videos & pictures in their highest fidelity formats.
  3. I have less reliance on any single cloud provider and have significantly more control over my data.
  4. Current documents are shared, where appropriate, via OneDrive, Google or Dropbox with automatic synchronization; however, compressed and encrypted bundles of archive data are preserved via S3/Glacier.

I hope this post was helpful, provoked some thought into your own backup and recovery process or had a tip or two in here for you. Thanks for reading!

Carbonite Backup/Restore Put to the Test

I’ve been a subscriber to Carbonite‘s Home Premier plan for almost a year. Recently, I purchased a replacement hard drive for my aging Hitachi Deskstar. I figured now was as good a time as any to test my local mirror backup. After all, if anything failed, I still had two good hard drives and could use Linux to dd copy the volume if anything went wrong… boy was it a good idea.

First, in the Fall, I had upgraded to Windows 8 (when you could get the full Pro license for $40 net from Amazon). Immediately, my mirror stopped processing because the software did not support Windows 8 at the time. Soon after, a software upgrade was released and it began backing up again. When the drive was plugged in, it would be active copying files as before. However, to my surprise, when using the bare metal restore disk, it indicated no new snapshots were available since my OS upgrade.

First note to any existing Windows 8 Carbonite users: If you’ve upgraded to Windows 8 and made this transition in the software as I have (I’d imagine very few have), wipe your drive and get a fresh snapshot. Chances are your mirror image isn’t as usable as you think.

I also purchased a new disk to use in my external enclosure for mirror backups. I reconnected my old drive and started a new mirror image on this new backup disk. Upon booting into my old copy of Windows 8, I attempted to select a new mirror image. Previously, I had been using E-SATA for backup and restore with great success. However, it turned out the recognition process only considers USB volumes. So I had to connect and wait out an initial snapshot using USB to prime my new mirror volume. What a pain. Given the software recognizes the drive thereafter using serial number, I consider this a pretty big deficiency in the software. Why not let me mirror to internal or external drives or regardless of connectivity to the bus? Weird. In either case, the snapshot taken immediately after was usable to move my OS and programs to the new disk.

In the end, I was able to use the mirror process to perform a bare metal restore; however, if I had been down a hard drive and only had the mirror rescue, all I would have had was a 4 month old mirror backup to start from… Re-applying the Windows 8 upgrade and using the cloud to pull the rest may have worked and been an interesting data point for this post, but I didn’t want to spend the time checking it out in the middle and cause me more pain. At some point I may re-attach my old mirror volume (which is in the defunct state) but I’ll save that for another date.

Lesson learned, trust but verify. Check out your bare metal process (if you have one) every once in a while… especially if you are paying for it!

Goodbye Google Apps – Free Edition

Recently, Google announced it was sunsetting the free edition of its Google Apps for Business.  In my opinion, this is a really good thing.  It’s a sounding call which should send ripples through the industry.  Free doesn’t work except for a few verticals like search; the ad revenue just won’t cut it.

While I am grandfathered in on a couple domains, I am close to my limits and I will have to consider alternatives in the near future.  I may be wrong, but it has appeared that since I have ceased running my own anti-spam gateways, mail servers, etc. the industry as a whole has stagnated.  As side offerings, they could probably afford to; however, I know for the little guys (like I was) it was impossible to compete with free.  However, now, there’s a line, $4.16/mo/user.  This is what you are competing with.  It also appears other services have noticed and their pricing has moved accordingly over the last 20 days.  Time to dust off the bat book?  I’m guessing, with respect to e-mail, prices will stagnate for awhile, but I can imagine they will drop as new competitors are able to steal market share and provide additional benefits.

Hopefully more services go this route.  I’ve mentioned to colleagues for a while services such as Twitter and Facebook should charge companies for their “pages” or individuals for “verified accounts”.  They receive plenty of value through the endorsements these services generate among consumers.  I know I’d pay a nominal fee to Twitter, Facebook and Linked In to verify myself across the web to my colleagues even though I’m not important.  It should also usher more value relating to privacy and our data for those people who should be customers, us.  After all, we’ll be paying for it!

Automatically Clean Old GMail

I really hate the way the Internet accumulates information about us. 30 years from now, I don’t want my kids to be able to pull up some tirade from a comment thread or a very compromising photo. Despite my efforts, I bet they will. This is why I try to automate filtering as much as I can into my local (or private cloud service) storage. Then, every once in awhile, you can go through the old stuff and prune down to only that which is useful or go flat out scorched earth on entire service families.

One thing I have found tedious is remembering to occasionally prune the old gmail history. I remove absolutely everything older than two years. I did a couple quick searches for a script or something, but didn’t get any hits. I finally wrote a script that will do it for me and scheduled it up in cron. It took me enough work jiggering around with setting labels, removing labels and flat-out deleting to finally come up with the working “just copy it to the folder” method that I thought it warranted sharing.

from imapclient import IMAPClient
from datetime import timedelta, date

# Computing datetime
now =
d = timedelta(days=(-365 * 2))
two_years = now + d

# Gmail IMAP delete
CUTOFF = 'before:' + two_years.isoformat()
HOST = ''

server = IMAPClient(HOST, use_uid=True, ssl=True)
server.login(USERNAME, PASSWORD)
server.select_folder('[Gmail]/All Mail')
messages =['X-GM-RAW ' + CUTOFF])
if len(messages) > 0:
    server.copy(messages, u'[Gmail]/Trash')

print '%d messages deleted' % (len(messages),)

I don’t save e-mail indefinitely outside of work, but it could probably be altered a little bit to save out the contents if you wanted to archive it somewhere other than the Google farm. It also plays nice such that it just puts the messages into the Trash folder (to be automatically cleaned by the service after another 30 days), but it could be further changed to really delete things for good without the Trash step.

Default PostgreSQL String Sort Order Bites Me in the SAS

Sometime during the development of an internal PHP and SAS mixed application, I’ve had some interesting transitions. Notably:

  • MySQL -> PostgreSQL
  • LATIN1 -> UTF-8
  • SAS 9.1.3 -> SAS 9.2

Most of these transitions went pretty straightforward. However, one bug got introduced somewhere along the way and I just couldn’t ever seem to figure out what would cause it.

For some reason, to download a list of features to get real ID numbers and then match by name, this wouldn’t work:

PROC SORT data=PG.features(RENAME=(id=feature_id name=feature_name)) out=features;
LABEL feature_id=”feature_id”;
BY feature_name;
WHERE release_id=&release_id;

DATA folders(KEEP=feature_id name);
MERGE folders(IN=in1) features(IN=in2);
BY feature_name;
IF in1;
IF in2;

I’d get bizarre errors out of SAS that the list wasn’t sorted. Whenever it occurred, I’d inspect the resultant (and intermediate) datasets, and everything seemed sorted just fine. Instead, I had to have something like this:

/* Downloading the features for this release */
PROC SORT data=PG.features(RENAME=(id=feature_id name=feature_name)) out=features;
LABEL feature_id=”feature_id”;
BY feature_name;
WHERE release_id=&release_id;

/* 9.2 workaround? For some reason if I sort on a RENAME or don’t, then try to */
/* MERGE after RENAME on a sorted field, it won’t work. */
PROC SORT data=features;
BY feature_name;

CREATE INDEX feature_name ON
DATA folders(KEEP=feature_id name);
MERGE folders(IN=in1) features(IN=in2);
BY feature_name;
IF in1;
IF in2;

You can clearly see my frustration (and the blaming of my employer’s own software over Postgres) in the comments. In addition, I probably overkilled the solution by re-sorting and then also creating an index, but it did make the problem go away.

Eventually, I got another bug report from a user with the phrasing, “Incorrect sorting within letter group in Features table”. That led me to this entry in the Postgres wiki:

I discovered SAS sorts strings based on rules found in “C” locale collation. Even though I read some documentation attributing the default LC_COLLATE setting as “C”, in fact, for my database, it was set to “en_US.UTF-8”. What this basically means is when sorting the following list:

  • GLMMOD : Tests
  • GLM : ODS Graphics
  • GLM : Checklist
  • GLMMOD : Checklist

You’ll get:

  1. GLM : Checklist
  2. GLMMOD : Checklist
  3. GLMMOD : Tests
  4. GLM : ODS Graphics

Which seems incorrect at first, until you realize it sorts disregarding whitespace and special characters. However, SAS and “C” locale collation sort like this:

  1. GLM : Checklist
  2. GLM : ODS Graphics
  3. GLMMOD : Checklist
  4. GLMMOD : Tests

Because PROC SORT and other SAS mechanisms issue and rely on native database commands for some operations, this behavior can produce results in ordering undesirable for SAS. SAS actually performed very admirably by delegating the sorting to the database, setting the appropriate flags on the dataset, but then still catch the match merge problem at runtime!

Long story short, when using PostgreSQL with SAS, it’s probably a good idea to make sure the database is created with the correct setting for LC_COLLATE. If it is not, you may end up with crazy gyrations like mine in your code. Luckily for me, it’s a fixable scenario whereby the database only needs dumped, then restored after it has been recreated with the desired collation.

Mac OS X Reinstall/Restore via Time Machine is Awesome!

During the holiday vacation, my Macbook started booting up with a rather lengthy progress bar. It would take about 10 minutes to complete. The operating system would come up fine, but it started getting me nervous because it was occurring on every boot, not just once (like an update post-processing after reboot or something). I had a feeling it was filesystem related just based on my experiences with Unix/Linux in the past. I figured some kind of fsck was working its magic and some quick searches seemed to confirm the thought. This was going to be my first attempt and resolving a system issue on the Mac; I’ve been lucky to have been on cruise control on there until now.

I eventually stumbled on the Mac’s console message viewer. He’s extremely handy. He zeroed me in on the culprit, “Incorrect number of thread records (#, #)”. Okay, great message. Maybe if you know the number is incorrect, you could tell me the correct number? Maybe if you know the number is incorrect, you could just fix it? Nah.

After some more research, the next remediation steps were to boot into the install disk, try running Disk Utility and also to start up a console, unmount the main volume and run fsck_hfs manually. Nada. All of those seemed to know all about my incorrect thread count, but none could make a correction. I must admit though, with a lot of Windows (and Linux) diagnostics behind me, the utilities are at least navigable well presented on the install disk.

With the easy stuff behind us, my options now get expensive or drastic. Most of the forum and support threads I found have DiskWarrior as the next logical step to try and fix the volume. At $100, I’m not quite ready to give up. I’m close though because with a BootCamp volume and a pretty important Windows 7 install on the other partition, formatting and redoing all that work doesn’t sound palatable… yet.

I do have AppleCare. I use the online form, open a ticket and schedule a callback on 12/31. Cool! The phone rings. The automated attendant informs me they are closed. What!? Why can I schedule the call then? The call ends. Two minutes later, I’m called again by Mr. Robot. Ugh. I re-schedule for another day to avoid infinite recursion! A slick customer service interface has a bug in it lets me schedule calls when they are on holiday and then has another bug in you can get in a loop.

OCD kicks in. I want this to go away now. What do I have unique to the Windows partition, really? Just a Quickbooks file, that’s not a big deal, back it up separate. Time Machine backup is a little behind, let’s take a new one. Okay, I’ll use the install disk to erase the Mac drive and try re-installing Snow Leopard. I’m sure my users and data will restore from Time Machine. I may have to do some homework to access my Windows partition (if it survives). I’ll probably have to redo my network setup, install iLife, Firefox, eclipse, etc. If my Windows partition doesn’t survive, that’ll be fun the next few days…

The install is pretty quick, 30-60 minutes. No problem. It restarts. Here comes my getting started page. There’s a migrating/moving via Time Machine button. Sure. The next screen was pretty surprising. I had options for restoring four things… something like users, data, settings and apps! Are you serious!? Wow. Something will probably not be quite right, but this is HUGE and a big divergence from my non-Mac experiences…

The Time Machine restore cranks through my files. It completes. I restart.

At the gray startup screen, I hold down “Alt”. Will Windows still be there? It is! Yay! He comes up just fine. That, in and of itself, is extremely cool. I had a feeling the partition would be there just from the screens I was seeing on the OS X install disk. Not sure about Vista or 7, but Windows prior would always blow away any bootloader I would have, no question.

I then start OS X. Surprise, surprise, I must have the correct number of thread records now, it boots in normal time :). The login screen has my users, their little thumbnails, cool.

Logging on was what really blew my mind. Once inside, all updates were already applied, the network settings are all there and everything/EVERYTHING was there; iLife (a separate install), an Eclipse I had downloaded and extracted somewhere funny, every other app and setting. Wow. I was so dumbfounded and filled with confidence about the whole process if it takes me more than an hour to find a button to undo a setting not 100% to my liking in the future, I will re-install from scratch and use the last backup not containing it in a heartbeat… For the Mac-savvy, or anybody who read into Time Machine since it came out, this may seem funny, but I still can’t get over it.

It didn’t escape me, going through the motions, OS X was treating it the same as if I had purchased a new computer and was just “moving” to it with my Time Machine backup. Companies don’t use Linux desktops (yet, ugh), but is it this smooth for the minority that do when going to new hardware? I doubt it. For companies using Windows desktops, entire armies of IT guys are still trying to purchase or develop applications/processes/scripts to make moving to new hardware this easy for the employees at large when hardware refreshes are in order. I know it wasn’t this smooth for me when I got my new gear the last time (XP -> XP). I have another Windows upgrade coming in January (XP -> 7), we’ll see how that goes…

Burning a Blu-ray in Linux

I’ve had a blu-ray burner in my Linux system for quite a while. Since about Fedora 8, I’ve been using commands to burn backups onto single-layer BD-RE media. I gleaned those commands out of a posting about dvd+/-rw tools (google cache). Here are the basics…

Ad-Hoc Burning

growisofs -Z /dev/dvd -R -J /path/to/files

– later –

growisofs -M /dev/dvd -R -J /more/files

– to finalize –

growisofs -M /dev/dvd=/dev/zero

Writing an ISO

growisofs -dvd-compat -Z /dev/dvd=/path/image.iso

Erase the Disk

growisofs -Z /dev/dvd=/dev/zero


dvd+rw-format -ssa=1G /dev/dvd

I’ve recently been trying to do some new things and I thought I would post that as of Fedora 11, Brasero can recognize and write files to my BD-RE media and also erase the disk to do it again. K3b is still at 1.0.5 (not a KDE 4 compatible version) on Fedora and does not recognize the disk correctly for type and capacity, nor does it allow burning.

My recent searches pull up the same results from 2007 and 2008 as before where people were unsure, etc. This is the current state of my world though.

*UPDATE 2010-06-25*

It’s been awhile since I’ve tried this, but now using Fedora 13, I am able to use K3B to burn single-layer BD-RE media at 2.0x speed. I have a Sony BDRW BWU-200S.

[cuppett@home ~]$ k3b –version
Qt: 4.6.2
KDE Development Platform: 4.4.4 (KDE 4.4.4)
K3b: 1.92.0
[cuppett@home ~]$ rpm -q dvd+rw-tools

Providing SOAP (non-REST) web services with CakePHP

I recently had a need to support a complex SOAP web service from CakePHP. Cake provides some built-in support for REST based web services; however, this situation required more. This post should show how to set this up on your own projects and still utilize all your normal controller and model goodness without too much screwing around.

Please see this attachment for the source code described in this article.

The method I will outline here requires the php-soap module.

First, the WSDL. For my project, I started with a WSDL created in another tool. My WSDL specifies a slightly different object set than my CakePHP application. I’m sure with PHP5 and some finessing of the Model classes, you could probably use the same set; however, it was easy enough to just create some really vanilla objects to house the transport objects and use those to communicate with the webservice. Both the WSDL and the receiving controller are present in the attachment.

What you will notice is that the *DTO objects defined in the controller file reciprocate the structure of the objects in the WSDL and the methods also are represented in the controller. I put them in the controller file because it wasn’t really obvious to me where in Cake’s structure “outside code” should really go. I have a separate I pull in up the class hierarchy, but that’s about as non-conventional as I want to get. Also, this controller is dedicated to just handling webservice requests and I only need these *DTO objects in that case, so locality wins and they are here. No real engineering genius here, their structure mimics what is defined in the WSDL file.

The real magic is in the controller. The controller’s remote() method is what handles the POST from the web via the port binding in the WSDL file. The remote() function sets up some of the basic stuff for SoapServer and is easily identified in the PHP manual. It’s even pretty easy to deduce we’re going to need to use SoapServer->setClass() somewhere and plug the name of our Controller in. However, there was one tidbit in the comments section of the manual regarding SoapServer->setObject(). It wasn’t documented (at the time), but after experimenting and looking at the PHP source, it does exactly what we need here, sets the handling class to an instantiated (aka existing) class object instead of trying to spawn a new one. Because we are already inside the CakePHP framework and running the remote() function, we already have the variables we want from beforeFilter(), we have our models loaded up, we may even have a user context from mod_auth_something. Perfect!!! So, we tell SoapServer to use our instantiated Controller. Once the *DTO classes are mapped and SoapServer is configured, it’s as simple as having it handle STDIN to tickle the rest of the methods in your Controller with the parameters populated. Two more tricks/problems remain: debug level & autoRender.

First, debug level. There’s bound to be a way around it; however, since I test with a web service client, when I do have a problem, I have to debug with lots of $this->log() calls. Turning up debugging to 1 or 2 is problematic because then CakePHP doesn’t spit back properly formed XML to the web service client and usually the client takes a SoapFault when that happens. I stick to debug level of 0 during development and deployment wrt the web service stuff.

Second, autoRender. Because SoapServer does the actual outputting of XML response to the client, I set the layout in the Controller to Ajax and also explicitly call exit() at the end of the remote() method. This ensures that CakePHP doesn’t send back a “Missing View”, half rendered $layout, or any other kind of automatic goodies.

I hope this article is helpful for anybody who might want/need to integrate a more elegant/esoteric webservice into their CakePHP architectures. I’m sure there are probably cleaner ways to put this into custom View classes, utilize Components, etc… however, this was a straightforward approach I found has been working really well for one of my applications.

HowTo: PostgreSQL – Adding more values to an ENUM type

I recently had trouble manipulating an ENUM field I had created in PostgreSQL.  I couldn’t find any suggestions or samples easily on Google or in the manual and was able to get it to work, so I post it here.  The basic premise is there is an ENUM field type created, I need more possible values and to preserve the existing values I already have to keep code working.

Initial creation of the type and table:

CREATE TYPE var_type AS ENUM('text', 'number', 'date', 'boolean');

CREATE TABLE custom_fields (
id bigserial PRIMARY KEY,
name varchar(50) NOT NULL,
pdf_type var_type NOT NULL

Running with this table for some time, invariably, new rows are created and there’s now a migration consideration.  As long as you are not using the table column as a reference in a foreign key, the following should work to preserve the data, drop and re-create the type.

The following creates a new column to hold the original text value:

ALTER TABLE custom_fields ADD COLUMN type_text varchar(15);
UPDATE custom_fields SET type_text = pdf_type::text;

We, then, need to drop the existing type and re-create it with the new values we want.  CASCADE automatically drops columns that depend on the type:

CREATE TYPE var_type AS ENUM('text', 'number', 'date', 'boolean', 'list');

This last part was what I couldn’t figure out without thinking a little more.  When you add it back, you have to cast the varchar column back into the ENUM type.  I had tried a variety of concoctions here before getting this to work:

ALTER TABLE custom_fields ADD COLUMN pdf_type var_type;
UPDATE custom_fields SET pdf_type = type_text::var_type;
ALTER TABLE custom_fields ALTER pdf_type SET NOT NULL;
ALTER TABLE custom_fields DROP COLUMN type_text;