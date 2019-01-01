The Node.js fs module enables us to work with the file system, which is one of the most fundamental tasks when writing programs.
Most developers think “working with files” just means reading from and writing to files. While those are the most common use cases, there is a lot more that lies under the hood.
This post will go into detail about how the file system works and some of the less understood (but equally important) concepts when working with the file system in Node.js.
To get started, let’s consider we have a directory with two files present inside it:
.
├── README.md
└── index.js
README.md is a markdown file that contains the words:
Hello world.
index.js is the file that contains the Node.js code that we will execute.
We can read the contents of
README.md using the
fs module:
const fs = require('fs')
// fs.readFile takes the file path and the callback
fs.readFile('README.md', (err, data) => {
// if there's an error, log it and return
if (err) {
console.error(err)
return
}
// Print the string representation of the data
console.log(data.toString())
})
This will give the output:
Hello World
We can write to a file in a similar fashion:
const fs = require('fs')
fs.writeFile('README.md', 'Hello World', (err) => {
// If there is any error in writing to the file, return
if (err) {
console.error(err)
return
}
// Log this message if the file was written to successfully
console.log('wrote to file successfully')
})
In this case, the program will create a new file
README.md and write
Hello World to it. If the file already exists, then it will be overwritten.
You may have noticed that the APIs to read and write to a file are asynchronous, which means they have a callback.
This is actually the recommended way to use the
fs module, since almost all operations related to working with the file system are blocking. Using an asynchronous model will make our code run much faster since we don’t need to wait for the underlying OS to complete its operations:
The fs sync API, on the other hand, blocks the NodeJS process until the OS completes its task:
The
fs module still provides synchronous APIs, which you can use as follows:
const fs = require('fs')
// The writeFileSync API takes the location of the file
// and the contents to be written to it
fs.writeFileSync('README.md', 'Hello Sync API!')
// The readFileSync API reads the file and returns a
// Buffer, whose `toString` method gives the string
// representation of the file
console.log(fs.readFileSync('README.md').toString())
Although they may look simpler to use, it’s generally recommended to use the async APIs for their better performance thanks to non-blocking I/O.
Every file has permissions associated with them. These permissions determine who can read, write and execute a particular file on your system.
On linux systems, running
ls -l will print information about the files and their permissions:
-rw-r--r-- 1 soham staff 11 Nov 16 13:46 README.md
-rw-r--r-- 1 soham staff 290 Nov 16 23:21 index.js
The last six characters of the first column give the permissions for the user, group, and public respectively:
These permissions can be represented by a six digit binary number, or a three digit octal number. For example, the permissions of the files I listed previously are
rw-r--r-- which in binary is
110 100 100, and in octal is
644.
We can use the
fs module to change the permission of
README.md to add write access for the user group (
664) :
const fs = require('fs')
fs.chmod('README.md', 0o644, (err) => {
if (err) {
console.error(err)
}
console.log('Permissions changed successfully');
})
If we list the file now, we can see the modified permissions:
↓write permission added
-rw-rw-r-- 1 soham staff 0B 17 Nov 23:33 README.md
-rw-r--r-- 1 soham staff 155B 17 Nov 23:40 index.js
When we use the
readFile and
writeFile methods of the
fs module, we treated the file as one chunk of data that we can read from or write to.
While this approach works for small files, it won’t scale for larger files. In this case, we need to think of each file as a stream of data, rather than a single large chunk.
Data streams allow us to work with large data without compromising the limited memory or CPU of our system. The
fs module allows us to make use of streams for this purpose.
To illustrate how read streams work, let’s read a large text file (named
words.txt) and count the total number of words in the file, using file streams:
const fs = require('fs')
// Initialize the time at which the program started
const startTime = new Date()
// create a read stream from the `words.txt` file
const rStream = fs.createReadStream('words.txt')
// initialize total word count
let total = 0
// the `on data` method registers a handler for everytime we
// receive new data from the file stream
rStream.on('data', b => {
// `b` here is the chunk of data received from the
// file stream
const bStr = b.toString()
// We split the string by spaces and new lines and add it to the
// total -- we subtract one because of the extra space/newline/broken word
// at the end of the chunk
// we shouldn't do this for the last chunk of data, which we handle later
total += bStr.split(/[\s\n]+/).length - 1
})
rStream.on('end', () => {
// Finally, the `on end` handler is called once the data stream completes
// we add one to the total, because we shouldn't subtract 1 from the last
// chunk of data in the `data` handler, for which we're compensating here
console.log('total words:', total + 1)
// Print the total time taken, as well as the total used program memory
console.log('total time:', (new Date()) - startTime)
const memoryUsedMb = process.memoryUsage().heapUsed / 1024 / 1024
console.log('the program used', memoryUsedMb, 'MB')
})
Running this code on my system gave the following output:
total words: 1280004
total time: 126
the program used 10.192085266113281 MB
Let’s compare this to the naive version of the same problem, where we read and split the entire file contents all at once using
fs.readFile method:
const fs = require('fs')
// Initialize the time at which the program started
const startTime = new Date()
fs.readFile('words.txt', (err, data) => {
if (err) {
console.error(err)
return
}
// Split the words based on spaces and newlines and print the length
const nWords = data.toString().split(/[\s\n]+/).length
console.log('total words:', nWords)
// print the total time taken and total program memory used
console.log('total time:', (new Date()) - startTime)
const memoryUsedMb = process.memoryUsage().heapUsed / 1024 / 1024
console.log('the program used', memoryUsedMb, 'MB')
})
Running this on my system showed me the stark difference in performance:
total words: 1280004
total time: 326
the program used 84.68199920654297 MB
The naive version took almost 3x as long, and more than 8x the memory as compared to using file streams.
Write streams are like read streams but in the other directions. Similar to how read streams work, we open a write stream to a file, and write to it in chunks, ending the stream once we’re done.
Here’s an example of how we can use write streams to store the first thousand numbers in the Fibonacci sequence:
const fs = require('fs')
class Fibonacci {
// The Fibonacci class has the previous number and current
// number as its instance attributes
constructor() {
this.prev = 0
this.current = 1
}
// the next method returns the current value, and
// increments the current value by adding the past value to it
next() {
const current = this.current
this.prev = current
this.current = current + this.prev
return current
}
}
// Iniitalize a writeStream to a a new file "fibonacci.txt"
const writeStream = fs.createWriteStream('fibonacci.txt')
// the on ready callback gets called once the file is available to write
writeStream.on('ready', () => {
// initialize a new object of the Fibonacci class
const f = new Fibonacci()
// For each iteration, obtain the next number in the sequence
// and write to the file, adding a newline each time
for (let i = 0; i < 1000; i++) {
const n = f.next()
writeStream.write(String(n) + '\n', err => {
// if there is any error in writing, log it
if (err) {
console.error('error writing:', err)
}
})
}
// The `end` method closes the write stream, once we're done
writeStream.end()
})
Similar to read streams, we gain immense performance benefits for cases where we need to write a large amount of information, or information which we do not always receive all at once, like logs.
Consider a directory which has a file and a folder (with another file) inside it:
.
├── index.js
└── tmp
└── tmp.txt
We can use the
fs.readdir method to list all the files and directories within a specified path:
const fs = require('fs')
fs.readdir('./', (err, files) => {
if (err) {
console.error(err)
return
}
console.log('files: ', files)
})
This will give the output:
files: [ 'index.js', 'tmp' ]
While this gives us information about the names of the contents, it doesn’t tell us whether an entry is a file, or another directory. We can set the
withFileType option to
true to give us more information about each entry:
const fs = require('fs')
fs.readdir('./', { withFileTypes: true }, (err, files) => {
if (err) {
console.error(err)
return
}
console.log('files: ')
files.forEach(file => {
// the `isDirectory` method returns true if the entry is a directory
const type = file.isDirectory() ? '📂' : '📄'
console.log(type, file.name)
})
})
Which will give us:
files:
📄 index.js
📂 tmp
Directories can be created and removed with the
fs.mkdir and
fs.rmdir methods respectively:
Creating a new directory:
const fs = require('fs')
fs.mkdir('./newdir', err => {
if (err) {
console.error(err)
return
}
console.log('directory created')
})
Removing a directory:
const fs = require('fs')
fs.rmdir('./newdir', err => {
if (err) {
console.error(err)
return
}
console.log('directory deleted')
})
Directory streams are used to walk through a directory entry-by-entry, rather than list all the entries at once.
A directory stream can be opened using the
fs.opendir method. The directory stream is provided in the callback argument, and we can use the
dir.read method to read the next file in the directory:
Similar to file streams, this is useful when a directory has a large number of files, or when you want to go through the files in a directory, and its subdirectories recursively.
const fs = require('fs')
// The async `opendir` method creates a stream from the directory
// passed as its first argument. The stream is present in the callback
fs.opendir('./', (err, dir) => {
if (err) {
// log and return if theres any error
console.error(err)
return
}
// A scoped function is defined that reads the next
// file in the directory and calls itself recursively
const readNext = (dir) => {
// The `read` method gives us information on the
// next file in the directory. If there are no
// more files left, the value of `file` is null
dir.read((err, file) => {
if (err) {
// log and return error
console.error(err)
return
}
// If file is null, we are done.
if (file === null) {
return
}
// If the file exists, log the name, along with
// the icon for its type
const type = file.isDirectory() ? '📂' : '📄'
console.log(type, file.name)
// Recursively call `readNext` for the next directory entry
readNext(dir)
})
}
// Call the `readNext` function with the first directory entry
readNext(dir)
})
This give me the output:
📄 delete.js
📄 index.js
📄 create.js
📄 streams.js
📂 tmp
Let’s talk more about the
err argument that we keep seeing in all the callbacks of the file system API.
Under normal conditions, we expect
err to be
null, but there are some common errors that you should watch out for.
Let’s create a directory with two files and one folder:
-rw-r--r-- 1 soham staff 400B 26 Nov 10:06 index.js
-r--r--r-- 1 soham staff 0B 26 Nov 10:03 restricted.txt
drwxr-xr-x 3 soham staff 96B 26 Nov 10:04 tmp
Now, lets run some code in
index.js to demonstrate some common errors:
Let’s try to read from a file that doesn’t exist:
fs.readFile('./does-not-exist.txt', (err, data) => {
// the error code is present in the errors `code` attribute
console.error('./does-not-exist.txt: ', err.code)
})
This gives us the output:
./does-not-exist.txt: ENOENT
ENOENT here means that the files does not exist (it literally expands to ”Error: NO ENTry”)
The
restricted.txt file has only read permissions for the user as well as the group. This means a user running a program to write to this file will receive an error:
fs.writeFile('restricted.txt', 'sample data', (err) => {
console.error('restricted.txt: ', err.code)
})
This will output:
restricted.txt: EACCES
What do you think happens if we call the
readFile method on a directory?
fs.readFile('./tmp', (err, data) => {
console.error('./tmp: ', err.code)
})
Well, of course, this will give us an error:
./tmp: EISDIR
As a corollary, if we try to run the
opendir method on an entry that’s not a directory:
fs.opendir('./index.js', (err, dir) => {
console.error('index.js :', err.code)
})
This will give us the
ENOTDIR error:
index.js : ENOTDIR
Note: These error codes are actually the same error codes returned by the OS. The codes discussed above are for Unix systems, and may differ if you’re on Windows.
