Jason Pan

潘忠显 / 2021-04-16


“JavaScript 工作原理”系列文章是翻译和整理自 SessionStack 网站的 How JavaScript works。因为博文发表于2017年,部分技术或信息可能已经过时。本文英文原文链接,作者 Alexander Zlatkov,翻译 潘忠显

How JavaScript works: Storage engines + how to choose the proper storage API

Alexander Zlatkov

Alexander ZlatkovFollow

Jun 13, 2018 · 15 min read

This is post # 16 of the series dedicated to exploring JavaScript and its building components. In the process of identifying and describing the core elements, we also share some rules of thumb we use when building SessionStack, a JavaScript application that needs to be robust and highly-performant to help users see and reproduce their web app defects real-time.

If you missed the previous chapters, you can find them here:

Overview

Choosing the right storage mechanisms for local device storage is crucial when designing your web app. A good storage engine makes sure your information is saved reliably, reduces bandwidth, and improves responsiveness. The right storage caching strategy is a core building block for enabling offline mobile web experiences, which is something more and more users feel should be the case by default.

In this chapter, we’ll discuss the available storage APIs and services and will offer some guidelines on how to make the right choice when building your web app.

Data Model

The data storing model determines how data is organized internally. This impacts the entire design of your web app, defines the tradeoffs to making your web app efficient yet solve the problem it should solve. There is no “better” approach and no one-size-fits-all solution as with almost everything related to engineering. So, let’s take a look at the data models that you could choose from:

Persistence

Storage methods for web apps can be analyzed with respect to the timeframe over which data is made persistent:

Data persistence in the browser

Nowadays, there are quite a few browser APIs that allow you to store data. We’ll go through some of them and create a comparison to make it easier for you to choose the right option.

First, however, there are a few things that you should consider before choosing how to persist your data. Of course, the first thing you have to understand very well is the way your web app is going to be used and later maintained and enhanced. Even if you have the answers to these questions, you may end up with a few options to choose from. So, here is what you should look at :

Comparison

In this section, we take a look at the current APIs available for web developers and compare them across the dimensions described above.

File system API

img

With the FileSystem API, a web app can create, read, navigate, and write to a sandboxed section of the user’s local file system.

The API is broken up into various themes:

The File system API is a non-standard API. You shouldn’t use it in production web apps since it will not work for every user. There may be large incompatibilities between implementations, and the behavior will probably change in the future.

The File and Directory Entries API interface FileSystem is used to represent a file system. These objects can be obtained from the filesystem property on any file system entry. Some browsers offer additional APIs to create and manage file systems.

This interface will not grant you access to the users’ filesystem. Instead, you will have a “virtual drive” within the browser sandbox. If you want to gain access to the users’ filesystem, you need to invoke the user by eg. installing a Chrome extension.

Requesting a file system

A web app can request access to a sandboxed file system by calling window.requestFileSystem():

// Note: The file system has been prefixed as of Google Chrome 12:
window.requestFileSystem = window.requestFileSystem || window.webkitRequestFileSystem;
window.requestFileSystem(type, size, successCallback, opt_errorCallback)

if you’re calling requestFileSystem() for the first time, new storage is created for your app. It’s important to remember that this file system is sandboxed, meaning one web app cannot access another app’s files.

After you get access to the file system, you can do most of the standard operations on files and directories.

The FileSystem is quite a different storage option compared to the others as it aims to satisfy client-side storage use cases not well served by databases. Generally, these are applications that deal with large binary blobs and/or share data with applications outside of the context of the browser.

These are good use-cases for the FileSystem API:

This is the current browser support for the API:

img

Local storage

img

The localStorage API allows you to access a Storage object for the Document’s origin. The stored data is saved across browser sessions. localStorage is similar to sessionStorage, except that while data stored in localStorage has no expiration time, data stored in sessionStorage gets cleared when the page session ends — that is, when the page is closed.

Note that data stored in either localStorage or sessionStorage is specific to the origin of the page, which is a combination of the protocol, host and the port.

This is the current browser support for the API:

img

Session storage

img

The sessionStorage property allows you to access a session Storage object for the current origin. sessionStorage is similar to localStorage, briefly explained above. The only difference is that while data stored in localStorage has no expiration set, data stored in sessionStorage gets cleared when the page session ends. A page session lasts for as long as the browser is open and survives over page reloads and restores. Opening a page in a new tab or window will cause a new session to be initiated, which differs from how session cookies work.

Note that data stored in either sessionStorage or localStorage is specific to the origin of the page.

This is the current browser support for the API:

img

Cookies

img

A cookie (web cookie, browser cookie) is a small piece of data that the user’ server sends to the user’s web browser. The browser may store it and send it back with the next request to the same server. Typically, it’s used to tell if two requests came from the same browser — keeping a user logged-in, for example. It remembers stateful information for the stateless HTTP protocol.

Cookies have three main use-cases:

Cookies were once used for general client-side storage. While this was legitimate when they were the only way to store data on the client, nowadays it is recommended to choose modern storage APIs. Cookies gets sent with every request, so they can deteriorate performance (especially when on a mobile data connection).

There are two types of cookies:

Note that confidential or sensitive information should never be stored or transmitted with HTTP Cookies, as the entire mechanism is inherently insecure.

And, as you might have guessed, cookies are widely supported among all browsers.

Cache

img

The Cache interface provides a storage mechanism for Request / Response object pairs that are cached. Note that the Cache interface is exposed to windowed scopes as well as workers. You don’t have to use it in conjunction with service workers, even though it is defined in the service worker spec.

An origin can have multiple, named Cache objects. You are responsible for implementing how your script (e.g. in a ServiceWorker) handles Cache updates. Items in a Cache do not get updated unless explicitly requested; they don’t expire unless deleted. Use CacheStorage.open() to open a specific, named Cache object and then call any of the Cache methods to maintain the Cache.

You are also responsible for periodically purging cache entries. Each browser has a hard limit on the amount of cache storage that a given origin can use. Cache quota usage estimates are available via the StorageEstimate API. The browser does its best to manage disk space, but it may delete the Cache storage for an origin. The browser will generally either delete all the data for an origin or none of it. Make sure to version caches by name and use the caches only from the version of the script that they can safely operate on. See Deleting old caches for more information.

The CacheStorage interface represents the storage for Cache objects.

The interface:

Use CacheStorage.open() to obtain a Cache instance.

Use CacheStorage.match() to check if a given Request is a key in any of the Cache objects that the CacheStorage object tracks.

You can access CacheStorage through the global caches property.

IndexedDB

img

IndexedDB is a way for you to persistently store data inside a user’s browser. Because it lets you create web applications with rich query abilities regardless of network availability, these applications can work both online and offline. IndexedDB is useful for applications that store a large amount of data (for example, a catalog of DVDs in a lending library) and applications that don’t need persistent internet connectivity to work (for example, mail clients, to-do lists, and notepads).

In this article, it’s the storage DB which we’re going to discuss in a bit more detail because the rest of the storage APIs are quite well-known. Plus, IndexedDB is getting more and more popular with the increased complexity of web apps nowadays.

The internals of IndexedDB

IndexedDB lets you store and retrieve objects that are stored with a “key.” All changes that you make to the database happen within transactions. Like most web storage solutions, IndexedDB follows a same-origin policy. So while you can access stored data within a domain, you cannot access data across different domains.

IndexedDB is an asynchronous API that can be used in most contexts, including WebWorkers. It used to include a synchronous version too, for use in web workers, but this was removed from the spec due to lack of interest by the web community.

IndexedDB used to have a competing spec called WebSQL Database, but it was deprecated by the W3C. While both IndexedDB and WebSQL are solutions for storage, they do not offer the same functionalities. WebSQL Database is a relational database access system, whereas IndexedDB is an indexed table system.

Don’t start working with IndexedDB, relying on your assumptions from other types of databases. Instead, you should read the docs carefully. Here are some of the essential concepts that you should have in mind:

IndexedDB limitations

IndexedDB is designed to cover most cases that need client-side storage. It has not been designed, however, for a few cases such as the following:

In addition, be aware that browsers can wipe out the database in the following conditions (ouch):

The exact circumstances and browser capabilities change over time, but the general philosophy of the browser vendors is to make the best effort to keep the data when possible.

img

Choosing the right storage API

As I mentioned already, it’s better to choose APIs that are widely supported across as many browsers as possible and which offer asynchronous call models, to maximize UI responsiveness. These criteria lead naturally to the following technology choices:

We at SessionStack use different storage APIs. For example, our library that is integrated into your web app is using both cookies and session storage. The reason is that our library is collecting data such as user events, DOM changes, network data, exceptions, debug messages and so on and sending them to our servers. We’re collecting such data from user sessions, but we need a proper way to determine when a user session starts and when it stops. We consider a session to be the entire period of web app usage from the start, containing all page views and navigations until the user closes his browser or the tab and doesn’t come back in a few minutes for which we use a combination of session storage and server-side logic. What’s more, , we allow you to identify individual end-users so that we can provide you with user data for each session. We rely on cookies to do this (just like other monitoring/analytics tools).

In our application, where you can watch (on-demand or real-time) the collected events as a video that recreates how users have stumbled upon issues, we use mainly cookies due to the RESTful nature of our service which basically requires just an authentication token to authenticate, authorize and validate requests.

There is a free plan if you’d like to give SessionStack a try.

img

References: