Sunday, December 20, 2015

Gulp Basics



Gulp is easy – that is what every gulp tutorial says, until you want to troubleshoot some nasty issues, then you realized that beneath that simple code (gulp.js is tiny), there are many layers of technical stuff. 


gulp.task('processJS', function(){
   
return gulp.src(["gulpTest/a.js","gulpTest/b.js"])
        .
pipe(gp_concat('ab_combined.js'))
        .
pipe(gulp.dest('gulpTest/dist'))
        .
pipe(gp_rename('ab_uglied.js'))
        .
pipe(gp_uglify())
        .pipe(gulp.dest('gulpTest/dist'));
});

 
Gulp lures you in with this elegant train-like code, which reads intuitively. Every gulp plugin accepts something and outputs something for the next gulp plugin to consume, as the following graph shows:



What makes this pipeline possible is Stream, which is at the core of Nodejs.

Stream

There are five types of Stream:

  • Readable
  • Writable
  •  Duplex (both readable and writable)
  •  Transform (a special kind of Duplex, Gulp uses this stream)
  •  Passthrough

A Transform stream has two important methods:
  • _transform(data, encoding, cb) - invoked whenever the Stream receives data, here is your opportunity to whatever transformation on the data you want, and you can push transformed data back to the pipeline using push(); you call the callback method cb to indicate you’ve finished your work 

  •  _flush(cb)– invoked when there is no more data to be received, you call the callback method cb to indicate you’ve finished your work

Here is a little exercise to get ourselves familiar with Transform Stream: read a file, line by line, uppercase each line and output it. 

To create a stream that takes input and uppercases it:


var stream = require('stream'), util = require('util');

var Transform = stream.Transform ||
    require('readable-stream').Transform;

function UpperStream () {
    Transform.call(this, { "objectMode": true }); 
}
util.inherits(UpperStream, Transform);


UpperStream.prototype._transform = function (data, encoding, cb) {
    console.log("i received:"+ data);
    this.push(data.toString().toUpperCase());
    cb();
};

UpperStream.prototype._flush = function( cb ){
    console.log("i've finished");
    this.push("DONE");
    cb();
}


The following code reads from a file, split it into lines (using split module), and uses the above UpperStream to uppercase each line.

var fs = require('fs');
var split=require('split');

var is = fs.createReadStream( "./test.txt" );
var lineIdx=0;
is.pipe(split() ).on('data', function (line) {
   
console.log("line "+lineIdx++ +":"+line);
})
    .
pipe(new UpperStream())
    .
pipe(process.stdout);

test.txt contains two lines:

aaa bbb
ccc ddd

Running the above code outputs:

line 0:aaa bbb
i received:aaa bbb
AAA BBBline 1:ccc ddd
i received:ccc ddd
CCC DDD i've finished
DONE

through2

through2 is a module that makes creating a transform stream more easily. With through2, the above UpperStream can be created as:

var _transform = function (data, encoding, cb) {
   
console.log("i received:"+ data);
   
this.push(data.toString().toUpperCase());
    cb();
};

var _flush = function( cb ){
   
console.log("i've finished");
   
this.push("DONE");
    cb();
}

function UpperStream () {
   
return( through2.obj( _transform, _flush ) );
}
gulp plugins use through2 to create transform stream. If you read into gulp plugin code, typically, you should look out for the invocation of through2.obj, to understand what the plugin does.

Gulp basics


 Gulp itself is tiny, it contains only 4 methods. 

 gulp.task

gulp.task(name[, dependencies], function () {

});
  Note that the dependencies are run in parallel. Behind the scene, Gulp uses Orchestrator to define and run tasks.

gulp.src


gulp.src(glob[, options]);

gulp.dest


gulp.dest(path);
This method writes files to the specified path.

 gulp.watch


gulp.watch(glob[, options], tasks);

This method watches over certain files (defined by glob and options), when they change, perform certain tasks.

Note, if you run gulp.watch in Webstorm, you may notice after a file is changed, it takes a long time for gulp.watch to realize the file has been changed and execute the specified tasks. I think this is because Webstorm automatically saves files, but it saves them after some delay, so it causes the seemly slowness in gulp.watch.

So what is passed in the gulp pipeline?  It is something called Vinyl file, below is its interface:

interface VinylFile {
cwd : string; // default: process.cwd()
base : string; // default: options.cwd
path : string; // default: null
stat : fs.Stats; // default: null
contents : Buffer|Stream; // default: null
isBuffer() : boolean;
isStream() : boolean;
isNull() : boolean;
clone() : VinylFile;
pipe(stream[, opt]) : Stream;
inspect() : string;
relative : string;
}
Armed with this bit of gulp knowledge, I find myself able to understand what a plugin does more easily, which comes handy. Because although gulp boasts to be easy, pretty soon, things get a bit of “unnatural”: not everything is a stream, and forcing everything to be a stream is a stretch in understanding.

The following code does the same thing as the code in the beginning of this blog, but adds two more gulp plugins.

var cache = require('gulp-cached'),
    remember = require('gulp-remember');

gulp.task('processJS', function(){
   
return gulp.src(["gulpTest/a.js","gulpTest/b.js"])
        .
pipe(cache('processJS'))
        .
pipe(remember('processJS'))
        .
pipe(gp_concat('ab_combined.js'))
        .
pipe(gulp.dest('gulpTest/dist'))
        .
pipe(gp_rename('ab_uglied.js'))
        .
pipe(gp_uglify())
        .
pipe(gulp.dest('gulpTest/dist'));
});

gulp.task(
'watch', function () {
    gulp.
watch('gulpTest/*.js', ['processJS']);
});


gulp.task(
'default', ['watch','processJS'], function(){});

What gulp-cached does is to remember what files pass through it, and if a file hasn’t been changed (it uses file checksum to determine if a file has been changed), it doesn’t pass this file through. The purpose of doing this is to save time. For example, after first run, gulp-cached remembers each file, if I change only b.js, gulp-cached will only pass through b.js, it will stop a.js from passing through.

But that will create a problem, with only b.js passing through, the resulting ab_combined.js and ab_uglied.js will only contains the content of b.js.

This is where gulp-remember comes into the play. Its npm page says:

gulp-remember is a gulp plugin that remembers files that have passed through it. gulp-remember adds all the files it has ever seen back into the stream.
gulp-remember pairs nicely with gulp-cached when you want to only rebuild the files that changed, but still need to operate on all files in the set.

Like gulp-cached, gulp-remember also remembers each file that has passed through it, and push all of them through the pipeline.

But again it creates a problem: if I deleted b.js, gulp-remember still remembers it in its memory, and passes it through.

To make gulp-remember (and gulp-cached) forget, we have to make this change to the watch task:

gulp.task('watch', function () {
    gulp.
watch('gulpTest/*.js', ['processJS'])
        .on(
'change', function (event) {
           
console.log("event happened:"+JSON.stringify(event));
           
if (event.type === 'deleted') {
               
//delete from gulp-remember cache
               
remember.forget('processJS', event.path);
               
//delete from gulp-cached cache
               
delete cache.caches['processJS'][event.path];
            }
        });
});
Gulp stops being so easy, right?