Gulp is easy – that is what every gulp tutorial says, until
you want to troubleshoot some nasty issues, then you realized that beneath that
simple code (gulp.js is tiny), there are many layers of technical stuff.
gulp.task('processJS', function(){
return gulp.src(["gulpTest/a.js","gulpTest/b.js"])
.pipe(gp_concat('ab_combined.js'))
.pipe(gulp.dest('gulpTest/dist'))
.pipe(gp_rename('ab_uglied.js'))
.pipe(gp_uglify()).pipe(gulp.dest('gulpTest/dist'));
});
Gulp lures you in with this elegant train-like code, which
reads intuitively. Every gulp plugin accepts something and outputs something for
the next gulp plugin to consume, as the following graph shows:
What makes this pipeline possible is Stream, which is at the
core of Nodejs.
Stream
There are five types of Stream:
- Readable
- Writable
- Duplex (both readable and writable)
- Transform (a special kind of Duplex, Gulp uses this stream)
- Passthrough
A Transform stream has two important methods:
- _transform(data, encoding, cb) - invoked whenever the Stream receives data, here is your opportunity to whatever transformation on the data you want, and you can push transformed data back to the pipeline using push(); you call the callback method cb to indicate you’ve finished your work
- _flush(cb)– invoked when there is no more data to be received, you call the callback method cb to indicate you’ve finished your work
Here is a little exercise to get ourselves familiar with
Transform Stream: read a file, line by line, uppercase each line and output it.
To create a stream that takes input and uppercases it:
var stream = require('stream'), util = require('util'); var Transform = stream.Transform || require('readable-stream').Transform;
function UpperStream () { Transform.call(this, { "objectMode": true }); } util.inherits(UpperStream, Transform); UpperStream.prototype._transform = function (data, encoding, cb) { console.log("i received:"+ data); this.push(data.toString().toUpperCase()); cb(); }; UpperStream.prototype._flush = function( cb ){ console.log("i've finished"); this.push("DONE"); cb(); }
The following code reads from a file, split it into lines
(using split module), and uses the above UpperStream to uppercase each line.
var fs = require('fs');
var split=require('split');
var is = fs.createReadStream( "./test.txt" );
var lineIdx=0;
is.pipe(split() ).on('data', function (line) {
console.log("line "+lineIdx++ +":"+line);
})
.pipe(new UpperStream())
.pipe(process.stdout);
test.txt
contains two lines:
aaa bbbccc ddd
Running the above code outputs:
line 0:aaa bbbi received:aaa bbbAAA BBBline 1:ccc dddi received:ccc dddCCC DDD i've finishedDONE
through2
through2 is
a module that makes creating a transform stream more easily. With through2, the
above UpperStream can be created as:
var _transform = function (data, encoding, cb) {
console.log("i received:"+ data);
this.push(data.toString().toUpperCase());
cb();
};
var _flush = function( cb ){
console.log("i've finished");
this.push("DONE");
cb();
}
function UpperStream () {
return( through2.obj( _transform, _flush ) );
}
gulp plugins use through2
to create transform stream. If you read into gulp plugin code, typically, you
should look out for the invocation of through2.obj,
to understand what the plugin does.
Gulp basics
Gulp itself is tiny, it contains only 4 methods.
gulp.task
Note that the dependencies are run in parallel. Behind the scene, Gulp uses Orchestrator to define and run tasks.gulp.task(name[, dependencies], function () {
});
gulp.src
gulp.src(glob[, options]);
gulp.dest
gulp.dest(path);
This method writes files to the specified path.
gulp.watch
gulp.watch(glob[, options], tasks);
This method watches over certain files (defined by glob and options), when
they change, perform certain tasks.
Note, if you run gulp.watch
in Webstorm, you may notice after a file is changed,
it takes a long time for gulp.watch to
realize the file has been changed and execute the specified tasks. I think this
is because Webstorm automatically saves files, but
it saves them after some delay, so it causes the seemly slowness in gulp.watch.
So what is passed in the gulp pipeline? It is something called Vinyl file, below is its interface:
interface VinylFile {cwd : string; // default: process.cwd()base : string; // default: options.cwdpath : string; // default: nullstat : fs.Stats; // default: nullcontents : Buffer|Stream; // default: nullisBuffer() : boolean;isStream() : boolean;isNull() : boolean;clone() : VinylFile;pipe(stream[, opt]) : Stream;inspect() : string;relative : string;}
Armed with this bit of gulp knowledge, I find myself able to
understand what a plugin does more easily, which comes handy. Because although
gulp boasts to be easy, pretty soon, things get a bit of “unnatural”: not
everything is a stream, and forcing everything to be a stream is a stretch in
understanding.
The following code does the same thing as the code in the
beginning of this blog, but adds two more gulp plugins.
var cache = require('gulp-cached'),remember = require('gulp-remember');gulp.task('processJS', function(){
return gulp.src(["gulpTest/a.js","gulpTest/b.js"])
.pipe(cache('processJS'))
.pipe(remember('processJS'))
.pipe(gp_concat('ab_combined.js'))
.pipe(gulp.dest('gulpTest/dist'))
.pipe(gp_rename('ab_uglied.js'))
.pipe(gp_uglify())
.pipe(gulp.dest('gulpTest/dist'));
});
gulp.task('watch', function () {
gulp.watch('gulpTest/*.js', ['processJS']);
});
gulp.task('default', ['watch','processJS'], function(){});
What gulp-cached does
is to remember what files pass through it, and if a file hasn’t been changed
(it uses file checksum to determine
if a file has been changed), it doesn’t pass this file through. The purpose of
doing this is to save time. For example, after first run, gulp-cached remembers each file, if I change only b.js, gulp-cached will
only pass through b.js, it will
stop a.js from passing through.
But that will create a problem, with only b.js passing through, the resulting ab_combined.js and ab_uglied.js
will only contains the content of b.js.
This is where gulp-remember
comes into the play. Its npm page says:
gulp-remember is a gulp plugin that
remembers files that have passed through it. gulp-remember adds all the files it has ever seen back into the stream.
gulp-remember pairs nicely with gulp-cached when you want to only rebuild the files that changed, but still
need to operate on all files in the set.
Like gulp-cached, gulp-remember also remembers each file that has passed
through it, and push all of them through the pipeline.
But again it creates a problem: if I deleted b.js, gulp-remember
still remembers it in its memory, and passes it through.
To make gulp-remember (and
gulp-cached) forget, we have to make this
change to the watch task:
gulp.task('watch', function () {
gulp.watch('gulpTest/*.js', ['processJS'])
.on('change', function (event) {
console.log("event happened:"+JSON.stringify(event));
if (event.type === 'deleted') {
//delete from gulp-remember cache
remember.forget('processJS', event.path);
//delete from gulp-cached cache
delete cache.caches['processJS'][event.path];
}
});
});
Gulp stops being so easy, right?