Introduction to Frontend Engineering 06 - Frontend Monitoring

When to Monitor#

When your application frequently reports errors without a clear reason.
When you need to analyze user interests and purchasing habits.
When you need to optimize the program, you can collect data through monitoring for targeted optimization.
When you need to ensure the reliability and stability of services.

If your application meets any of the above criteria, you can implement monitoring. The role of monitoring has two aspects: pre-warning and post-analysis.

Pre-warning: Set a threshold in advance, and when the monitored data reaches the threshold, notify the administrator via SMS or email. For example, if the number of API requests suddenly spikes, an alert should be triggered; otherwise, it may cause the server to crash.

Post-analysis: Analyze the cause of failures and the points of failure through monitoring log files. This allows for modifications to prevent such situations from occurring again.

This chapter is divided into two parts: analysis of front-end monitoring principles and how to implement monitoring for projects. The first part discusses how to write a simple monitoring SDK, while the second part discusses how to use sentry to implement project monitoring.

Now, let's get into the main content.

A complete front-end monitoring platform consists of three parts: data collection and reporting, data organization and storage, and data presentation.

Performance Data Collection#

The Chrome development team has proposed a series of metrics for detecting web performance:

FP (first-paint), the time from the start of page loading to the first pixel being rendered on the screen.
FCP (first-contentful-paint), the time from the start of page loading to any part of the page content being rendered on the screen.
LCP (largest-contentful-paint), the time from the start of page loading to the largest text block or image element being rendered on the screen.
CLS (layout-shift), the cumulative score of all unexpected layout shifts that occur from the start of page loading until its lifecycle state becomes hidden.

These four performance metrics need to be obtained through PerformanceObserver (they can also be obtained through performance.getEntriesByName(), but it does not notify when events are triggered). PerformanceObserver is a performance monitoring object used to monitor performance measurement events.

FP#

FP (first-paint), the time from the start of page loading to the first pixel being rendered on the screen. It is also acceptable to understand FP as the white screen time.

The measurement code is as follows:

const entryHandler = (list) => {
    for (const entry of list.getEntries()) {
        if (entry.name === 'first-paint') {
            observer.disconnect()
        }

       console.log(entry)
    }
}

const observer = new PerformanceObserver(entryHandler)
// The buffered property indicates whether to observe cached data, meaning it doesn't matter if the observation code is added later than the event trigger.
observer.observe({ type: 'paint', buffered: true })

Using the above code, you can obtain the content of FP:

{
    duration: 0,
    entryType: "paint",
    name: "first-paint",
    startTime: 359, // fp time
}

Where startTime is the rendering time we want.

FCP#

FCP (first-contentful-paint), the time from the start of page loading to any part of the page content being rendered on the screen. For this metric, "content" refers to text, images (including background images), <svg> elements, or non-white <canvas> elements.

To provide a good user experience, the FCP score should be controlled within 1.8 seconds.

Measurement code:

const entryHandler = (list) => {
    for (const entry of list.getEntries()) {
        if (entry.name === 'first-contentful-paint') {
            observer.disconnect()
        }

        console.log(entry)
    }
}

const observer = new PerformanceObserver(entryHandler)
observer.observe({ type: 'paint', buffered: true })

Using the above code, you can obtain the content of FCP:

{
    duration: 0,
    entryType: "paint",
    name: "first-contentful-paint",
    startTime: 459, // fcp time
}

Where startTime is the rendering time we want.

LCP#

LCP (largest-contentful-paint), the time from the start of page loading to the largest text block or image element being rendered on the screen. The LCP metric reports the relative time at which the largest image or text block visible in the viewport is rendered based on the first start loading time of the page.

A good LCP score should be controlled within 2.5 seconds.

Measurement code:

const entryHandler = (list) => {
    if (observer) {
        observer.disconnect()
    }

    for (const entry of list.getEntries()) {
        console.log(entry)
    }
}

const observer = new PerformanceObserver(entryHandler)
observer.observe({ type: 'largest-contentful-paint', buffered: true })

Using the above code, you can obtain the content of LCP:

{
    duration: 0,
    element: p,
    entryType: "largest-contentful-paint",
    id: "",
    loadTime: 0,
    name: "",
    renderTime: 1021.299,
    size: 37932,
    startTime: 1021.299,
    url: "",
}

Where startTime is the rendering time we want. The element refers to the DOM element rendered by LCP.

The difference between FCP and LCP is that FCP is triggered as soon as any content is rendered, while LCP is triggered when the largest content is fully rendered.

The types of elements considered for LCP are:

<img> elements
<image> elements embedded within <svg> elements
<video> elements (using a cover image)
Elements with background images loaded through the url() function (rather than using CSS gradients)
Block-level elements containing text nodes or other inline text elements as children.

CLS#

CLS (layout-shift), the cumulative score of all unexpected layout shifts that occur from the start of page loading until its lifecycle state becomes hidden.

The calculation of layout shift scores is as follows:

Layout shift score = impact score * distance score

The impact score measures the effect of unstable elements on the visible area between two frames.

The distance score refers to the maximum distance (horizontal or vertical) that any unstable element is displaced in a frame divided by the maximum dimension of the visible area (width or height, whichever is larger).

CLS is the sum of all layout shift scores.

When a DOM element causes a shift between two rendering frames, CLS is triggered.

At the same time, in CLS, there is a term called session window: one or more rapidly occurring single layout shifts, with each shift occurring less than 1 second apart, and the maximum duration of the entire window being 5 seconds.

For example, in the second session window shown above, there are four layout shifts, and the interval between each shift must be less than 1 second, and the time between the first and last shift must not exceed 5 seconds for it to be considered a session window. If it does not meet this condition, it is considered a new session window. Some may ask why this is specified. This is actually the result of extensive experiments and research conducted by the Chrome team Evolving the CLS metric.

There are three calculation methods for CLS:

Cumulative
Average of all session windows
Maximum value among all session windows

Cumulative#

This means summing all layout shift scores from the start of page loading. However, this calculation method is not friendly to pages with long lifetimes; the longer the page stays, the higher the CLS score.

Average of all session windows#

This calculation method does not count individual layout shifts but rather counts by session window. It sums the values of all session windows and then takes the average. However, this method also has drawbacks.

As can be seen from the above figure, the first session window produced a relatively large CLS score, while the second session window produced a relatively small CLS score. If the average of these two is taken as the CLS score, it does not reflect the actual performance of the page. The page had many early shifts and fewer later shifts, and the current average cannot reflect this situation.

Maximum value among all session windows#

This method is currently the optimal calculation method, taking only the maximum value among all session windows to reflect the worst-case scenario of page layout shifts. For details, see Evolving the CLS metric.

Here is the measurement code for the third calculation method:

let sessionValue = 0
let sessionEntries = []
const cls = {
    subType: 'layout-shift',
    name: 'layout-shift',
    type: 'performance',
    pageURL: getPageURL(),
    value: 0,
}

const entryHandler = (list) => {
    for (const entry of list.getEntries()) {
        // Only count layout shifts without recent user input.
        if (!entry.hadRecentInput) {
            const firstSessionEntry = sessionEntries[0]
            const lastSessionEntry = sessionEntries[sessionEntries.length - 1]

            // If the entry occurred less than 1 second after the previous entry and
            // less than 5 seconds after the first entry in the session, include the
            // entry in the current session. Otherwise, start a new session.
            if (
                sessionValue
                && entry.startTime - lastSessionEntry.startTime < 1000
                && entry.startTime - firstSessionEntry.startTime < 5000
            ) {
                sessionValue += entry.value
                sessionEntries.push(formatCLSEntry(entry))
            } else {
                sessionValue = entry.value
                sessionEntries = [formatCLSEntry(entry)]
            }

            // If the current session value is larger than the current CLS value,
            // update CLS and the entries contributing to it.
            if (sessionValue > cls.value) {
                cls.value = sessionValue
                cls.entries = sessionEntries
                cls.startTime = performance.now()
                lazyReportCache(deepCopy(cls))
            }
        }
    }
}

const observer = new PerformanceObserver(entryHandler)
observer.observe({ type: 'layout-shift', buffered: true })

After reading the above text description, the code becomes easier to understand. The measurement content of a layout shift is as follows:

{
  duration: 0,
  entryType: "layout-shift",
  hadRecentInput: false,
  lastInputTime: 0,
  name: "",
  sources: (2) [LayoutShiftAttribution, LayoutShiftAttribution],
  startTime: 1176.199999999255,
  value: 0.000005752046026677329,
}

The value field in the code is the layout shift score.

DOMContentLoaded and Load Events#

The DOMContentLoaded event is triggered when pure HTML has been fully loaded and parsed, without waiting for CSS, images, or iframes to finish loading.

The load event is triggered when the entire page and all dependent resources such as stylesheets and images have completed loading.

Although these two performance metrics are relatively old, they can still reflect some conditions of the page. It is still necessary to listen to them.

import { lazyReportCache } from '../utils/report'

['load', 'DOMContentLoaded'].forEach(type => onEvent(type))

function onEvent(type) {
    function callback() {
        lazyReportCache({
            type: 'performance',
            subType: type.toLocaleLowerCase(),
            startTime: performance.now(),
        })

        window.removeEventListener(type, callback, true)
    }

    window.addEventListener(type, callback, true)
}

First Screen Rendering Time#

In most cases, the first screen rendering time can be obtained through the load event, except for some special cases, such as asynchronously loaded images and DOM.

<script>
    setTimeout(() => {
        document.body.innerHTML = `
            <div>
                <!-- Omitted a bunch of code... -->
            </div>
        `
    }, 3000)
</script>

In such cases, the first screen rendering time cannot be obtained through the load event. We need to use MutationObserver to obtain the first screen rendering time. MutationObserver triggers events when the attributes of the observed DOM elements change.

The calculation process for first screen rendering time is as follows:

Use MutationObserver to observe the document object, triggering an event whenever the attributes of DOM elements change.
Determine whether the DOM element is within the first screen; if it is, call performance.now() in the requestAnimationFrame() callback function to get the current time as its rendering time.
Compare the rendering time of the last DOM element with the loading times of all images on the first screen, taking the maximum value as the first screen rendering time.

Listening to DOM Changes#

const next = window.requestAnimationFrame ? requestAnimationFrame : setTimeout
const ignoreDOMList = ['STYLE', 'SCRIPT', 'LINK']

observer = new MutationObserver(mutationList => {
    const entry = {
        children: [],
    }

    for (const mutation of mutationList) {
        if (mutation.addedNodes.length && isInScreen(mutation.target)) {
             // ...
        }
    }

    if (entry.children.length) {
        entries.push(entry)
        next(() => {
            entry.startTime = performance.now()
        })
    }
})

observer.observe(document, {
    childList: true,
    subtree: true,
})

The above code listens for DOM changes while filtering out tags like style, script, and link.

Determining if an Element is in the First Screen#

A page may contain a lot of content, but users can only see one screen's worth of content at most. Therefore, when calculating the first screen rendering time, it is necessary to limit the range and restrict the rendered content to what is currently visible on the screen.

const viewportWidth = window.innerWidth
const viewportHeight = window.innerHeight

// Check if the DOM object is within the screen
function isInScreen(dom) {
    const rectInfo = dom.getBoundingClientRect()
    if (rectInfo.left < viewportWidth && rectInfo.top < viewportHeight) {
        return true
    }

    return false
}

Using `requestAnimationFrame()` to Get DOM Rendering Time#

When the DOM change triggers the MutationObserver event, it only indicates that the DOM content can be read; it does not mean that the DOM has been rendered on the screen.

When the MutationObserver event is triggered, it may be possible to read that document.body already has content, but in reality, the screen has not rendered any content. Therefore, it is necessary to call requestAnimationFrame() to obtain the current time as the DOM rendering time after the browser has successfully rendered.

Comparing with All Image Loading Times on the First Screen#

function getRenderTime() {
    let startTime = 0
    entries.forEach(entry => {
        if (entry.startTime > startTime) {
            startTime = entry.startTime
        }
    })

    // Compare with the loading times of all images on the current page, taking the maximum value
    // Image request time must be less than startTime, and response end time must be greater than startTime
    performance.getEntriesByType('resource').forEach(item => {
        if (
            item.initiatorType === 'img'
            && item.fetchStart < startTime
            && item.responseEnd > startTime
        ) {
            startTime = item.responseEnd
        }
    })

    return startTime
}

Optimization#

The current code is not fully optimized, and there are two main points to note:

When to report rendering time?
How to handle the case of asynchronously added DOM?

For the first point, rendering time must be reported after the DOM no longer changes. Generally, after the load event is triggered, the DOM will no longer change, so we can report at that time.

For the second point, reporting can be allowed after the LCP event is triggered. Regardless of whether the DOM is loaded synchronously or asynchronously, it needs to be rendered, so the LCP event can be listened to, and reporting can be allowed after that event is triggered.

Combining the above two solutions results in the following code:

let isOnLoaded = false
executeAfterLoad(() => {
    isOnLoaded = true
})

let timer
let observer
function checkDOMChange() {
    clearTimeout(timer)
    timer = setTimeout(() => {
        // After the load and lcp events are triggered and the DOM tree no longer changes, calculate the first screen rendering time
        if (isOnLoaded && isLCPDone()) {
            observer && observer.disconnect()
            lazyReportCache({
                type: 'performance',
                subType: 'first-screen-paint',
                startTime: getRenderTime(),
                pageURL: getPageURL(),
            })

            entries = null
        } else {
            checkDOMChange()
        }
    }, 500)
}

The checkDOMChange() code is called each time the MutationObserver event is triggered and needs to be processed with a debounce function.

Interface Request Duration#

The duration of interface requests needs to be monitored for XMLHttpRequest and fetch.

Listening to XMLHttpRequest

originalProto.open = function newOpen(...args) {
    this.url = args[1]
    this.method = args[0]
    originalOpen.apply(this, args)
}

originalProto.send = function newSend(...args) {
    this.startTime = Date.now()

    const onLoadend = () => {
        this.endTime = Date.now()
        this.duration = this.endTime - this.startTime

        const { status, duration, startTime, endTime, url, method } = this
        const reportData = {
            status,
            duration,
            startTime,
            endTime,
            url,
            method: (method || 'GET').toUpperCase(),
            success: status >= 200 && status < 300,
            subType: 'xhr',
            type: 'performance',
        }

        lazyReportCache(reportData)

        this.removeEventListener('loadend', onLoadend, true)
    }

    this.addEventListener('loadend', onLoadend, true)
    originalSend.apply(this, args)
}

How to determine if the XML request was successful? You can check if its status code is between 200 and 299. If it is, then it is successful; otherwise, it fails.

Listening to fetch

const originalFetch = window.fetch

function overwriteFetch() {
    window.fetch = function newFetch(url, config) {
        const startTime = Date.now()
        const reportData = {
            startTime,
            url,
            method: (config?.method || 'GET').toUpperCase(),
            subType: 'fetch',
            type: 'performance',
        }

        return originalFetch(url, config)
        .then(res => {
            reportData.endTime = Date.now()
            reportData.duration = reportData.endTime - reportData.startTime

            const data = res.clone()
            reportData.status = data.status
            reportData.success = data.ok

            lazyReportCache(reportData)

            return res
        })
        .catch(err => {
            reportData.endTime = Date.now()
            reportData.duration = reportData.endTime - reportData.startTime
            reportData.status = 0
            reportData.success = false

            lazyReportCache(reportData)

            throw err
        })
    }
}

For fetch, you can determine if the request was successful based on the ok field in the returned data; if it is true, the request was successful; otherwise, it failed.

Note: The interface request time monitored may differ from what is detected in Chrome DevTools. This is because what is detected in Chrome DevTools is the time taken for the HTTP request to be sent and the entire interface process. However, xhr and fetch are asynchronous requests, and after the interface request is successful, a callback function needs to be called. When the event is triggered, the callback function is placed in the message queue, and then the browser processes it, which involves a waiting process.

Resource Loading Time and Cache Hit Rate#

You can listen for resource and navigation events through PerformanceObserver. If the browser does not support PerformanceObserver, you can also use performance.getEntriesByType(entryType) for fallback processing.

When the resource event is triggered, you can obtain the corresponding resource list, where each resource object contains several fields:
From these fields, we can extract some useful information:

{
    name: entry.name, // Resource name
    subType: entryType,
    type: 'performance',
    sourceType: entry.initiatorType, // Resource type
    duration: entry.duration, // Resource loading duration
    dns: entry.domainLookupEnd - entry.domainLookupStart, // DNS duration
    tcp: entry.connectEnd - entry.connectStart, // TCP connection establishment duration
    redirect: entry.redirectEnd - entry.redirectStart, // Redirect duration
    ttfb: entry.responseStart, // Time to first byte
    protocol: entry.nextHopProtocol, // Request protocol
    responseBodySize: entry.encodedBodySize, // Response content size
    responseHeaderSize: entry.transferSize - entry.encodedBodySize, // Response header size
    resourceSize: entry.decodedBodySize, // Resource size after decompression
    isCache: isCache(entry), // Whether the cache was hit
    startTime: performance.now(),
}

Determining if the Resource Hit the Cache

In these resource objects, there is a transferSize field that indicates the size of the resource obtained, including the size of the response header and response data. If this value is 0, it means it was read directly from the cache (strong cache). If this value is not 0, but the encodedBodySize field is 0, it means it used negotiated caching (encodedBodySize indicates the size of the request response data body).

function isCache(entry) {
    // Read directly from cache or 304
    return entry.transferSize === 0 || (entry.transferSize !== 0 && entry.encodedBodySize === 0)
}

For those that do not meet the above conditions, it indicates that the cache was not hit. Then, the cache hit rate can be calculated as all cached data / total data.

Browser Back/Forward Cache (BFC)#

BFC is a type of memory cache that saves the entire page in memory. When the user returns, they can immediately see the entire page without refreshing again. According to the article bfcache, Firefox and Safari have always supported BFC, while Chrome only supports it in high versions of mobile browsers. However, I tried it, and only Safari supports it; my Firefox version may not be correct.

However, BFC also has drawbacks. When the user returns and restores the page from BFC, the original page code will not be executed again. Therefore, the browser provides a pageshow event, where the code that needs to be executed again can be placed.

window.addEventListener('pageshow', function(event) {
  // If this property is true, it means the page was restored from the bfcache.
  if (event.persisted) {
    console.log('This page was restored from the bfcache.');
  } else {
    console.log('This page was loaded normally.');
  }
});

For pages restored from BFC, we also need to collect their FP, FCP, LCP, and other timing metrics.

onBFCacheRestore(event => {
    requestAnimationFrame(() => {
        ['first-paint', 'first-contentful-paint'].forEach(type => {
            lazyReportCache({
                startTime: performance.now() - event.timeStamp,
                name: type,
                subType: type,
                type: 'performance',
                pageURL: getPageURL(),
                bfc: true,
            })
        })
    })
})

The above code is easy to understand. After the pageshow event is triggered, the current time minus the event trigger time gives the time difference, which is the rendering time of the performance metrics. Note: The performance metrics for pages restored from BFC are generally very small, usually around 10 ms. Therefore, a flag field bfc: true should be added to them. This way, during performance statistics, they can be ignored.

FPS#

Using requestAnimationFrame(), we can calculate the current page's FPS.

const next = window.requestAnimationFrame
    ? requestAnimationFrame : (callback) => { setTimeout(callback, 1000 / 60) }

const frames = []

export default function fps() {
    let frame = 0
    let lastSecond = Date.now()

    function calculateFPS() {
        frame++
        const now = Date.now()
        if (lastSecond + 1000 <= now) {
            // Since now - lastSecond is in milliseconds, frame needs to be * 1000
            const fps = Math.round((frame * 1000) / (now - lastSecond))
            frames.push(fps)

            frame = 0
            lastSecond = now
        }

        // To avoid reporting too quickly, cache a certain number before reporting
        if (frames.length >= 60) {
            report(deepCopy({
                frames,
                type: 'performace',
                subType: 'fps',
            }))

            frames.length = 0
        }

        next(calculateFPS)
    }

    calculateFPS()
}

The code logic is as follows:

First, record an initial time, and then each time requestAnimationFrame() is triggered, increment the frame count by 1. After one second has passed, use frame count / elapsed time to get the current frame rate.

When three consecutive FPS values below 20 occur, we can conclude that the page is stuttering. For details, see How to Monitor Web Page Stuttering.

export function isBlocking(fpsList, below = 20, last = 3) {
    let count = 0
    for (let i = 0; i < fpsList.length; i++) {
        if (fpsList[i] && fpsList[i] < below) {
            count++
        } else {
            count = 0
        }

        if (count >= last) {
            return true
        }
    }

    return false
}

Vue Router Change Rendering Time#

We already know how to calculate the first screen rendering time, but how to calculate the page rendering time caused by page route changes in SPA applications? This chapter uses Vue as an example to explain my thoughts.

export default function onVueRouter(Vue, router) {
    let isFirst = true
    let startTime
    router.beforeEach((to, from, next) => {
        // The first time entering the page already has other recorded rendering times available
        if (isFirst) {
            isFirst = false
            return next()
        }

        // Add a new field to the router indicating whether to calculate rendering time
        // Only route jumps need to be calculated
        router.needCalculateRenderTime = true
        startTime = performance.now()

        next()
    })

    let timer
    Vue.mixin({
        mounted() {
            if (!router.needCalculateRenderTime) return

            this.$nextTick(() => {
                // This code will only run after the entire view has been rendered
                const now = performance.now()
                clearTimeout(timer)

                timer = setTimeout(() => {
                    router.needCalculateRenderTime = false
                    lazyReportCache({
                        type: 'performance',
                        subType: 'vue-router-change-paint',
                        duration: now - startTime,
                        startTime: now,
                        pageURL: getPageURL(),
                    })
                }, 1000)
            })
        },
    })
}

The code logic is as follows:

Listen for the route hook, which triggers router.beforeEach() when the route changes. In the callback function of this hook, record the current time as the rendering start time.
Use Vue.mixin() to inject a function into the mounted() of all components. Each function executes a debounce function.
When the last component's mounted() is triggered, it indicates that all components under this route have been mounted. The rendering time can be obtained in the this.$nextTick() callback function.

At the same time, consider a situation. When the route is not switched, there may also be component changes; in this case, rendering time should not be calculated in these components' mounted(). Therefore, a needCalculateRenderTime field needs to be added, which is set to true when switching routes, indicating that rendering time can be calculated.

Error Data Collection#

Resource Loading Errors#

Using addEventListener() to listen for error events can capture resource loading failure errors.

// Capture resource loading failure errors for js, css, img, etc.
window.addEventListener('error', e => {
    const target = e.target
    if (!target) return

    if (target.src || target.href) {
        const url = target.src || target.href
        lazyReportCache({
            url,
            type: 'error',
            subType: 'resource',
            startTime: e.timeStamp,
            html: target.outerHTML,
            resourceType: target.tagName,
            paths: e.path.map(item => item.tagName).filter(Boolean),
            pageURL: getPageURL(),
        })
    }
}, true)

JS Errors#

Using window.onerror can listen for JS errors.

// Listen for JS errors
window.onerror = (msg, url, line, column, error) => {
    lazyReportCache({
        msg,
        line,
        column,
        error: error.stack,
        subType: 'js',
        pageURL: url,
        type: 'error',
        startTime: performance.now(),
    })
}

Promise Errors#

Using addEventListener() to listen for unhandledrejection events can capture unhandled promise errors.

// Listen for promise errors; the downside is that it cannot obtain column data
window.addEventListener('unhandledrejection', e => {
    lazyReportCache({
        reason: e.reason?.stack,
        subType: 'promise',
        type: 'error',
        startTime: e.timeStamp,
        pageURL: getPageURL(),
    })
})

Source Map#

Generally, the code in the production environment is compressed, and the source map files are not uploaded in the production environment. Therefore, it is very difficult to read the error messages of the code in the production environment. Thus, we can use source-map to restore the error messages of these compressed codes.

When a code error occurs, we can obtain the corresponding file name, line number, and column number:

{
    line: 1,
    column: 17,
    file: 'https:/www.xxx.com/bundlejs',
}

Then call the following code to restore:

async function parse(error) {
    const mapObj = JSON.parse(getMapFileContent(error.url))
    const consumer = await new sourceMap.SourceMapConsumer(mapObj)
    // Remove ./ from webpack://source-map-demo/./src/index.js
    const sources = mapObj.sources.map(item => format(item))
    // Use the compressed error message to derive the uncompressed error line and column numbers and source file
    const originalInfo = consumer.originalPositionFor({ line: error.line, column: error.column })
    // sourcesContent contains the uncompressed source code of each file; find the corresponding source code based on the file name
    const originalFileContent = mapObj.sourcesContent[sources.indexOf(originalInfo.source)]
    return {
        file: originalInfo.source,
        content: originalFileContent,
        line: originalInfo.line,
        column: originalInfo.column,
        msg: error.msg,
        error: error.error
    }
}

function format(item) {
    return item.replace(/(\\.\\/)*/g, '')
}

function getMapFileContent(url) {
    return fs.readFileSync(path.resolve(__dirname, `./maps/${url.split('/').pop()}.map`), 'utf-8')
}

Each time the project is packaged, if sourcemap is enabled, each js file will have a corresponding map file.

bundle.js
bundle.js.map

At this point, the js file is placed on a static server for users to access, while the map file is stored on the server for restoring error messages. The source-map library can restore the uncompressed code error messages based on the compressed code error information. For example, if the compressed error position is line 1, column 47, the restored position may be line 4, column 10. In addition to position information, the original source code can also be obtained.

Vue Errors#

Using window.onerror cannot capture Vue errors; it needs to use the API provided by Vue for listening.

Vue.config.errorHandler = (err, vm, info) => {
    // Print the error message to the console
    console.error(err)

    lazyReportCache({
        info,
        error: err.stack,
        subType: 'vue',
        type: 'error',
        startTime: performance.now(),
        pageURL: getPageURL(),
    })
}

Behavior Data Collection#

PV and UV#

PV (page view) is the number of page views, and UV (Unique visitor) is the number of unique visitors. PV counts as one each time a page is accessed, while UV counts only once for multiple accesses on the same day.

For the front end, it is sufficient to report a PV each time the page is entered, while the UV statistics can be done on the server side, mainly by analyzing the reported data to calculate UV.

export default function pv() {
    lazyReportCache({
        type: 'behavior',
        subType: 'pv',
        startTime: performance.now(),
        pageURL: getPageURL(),
        referrer: document.referrer,
        uuid: getUUID(),
    })
}

Page Stay Duration#

When a user enters a page, record an initial time. When the user leaves the page, subtract the initial time from the current time to get the user's stay duration. This calculation logic can be done in the beforeunload event.

export default function pageAccessDuration() {
    onBeforeunload(() => {
        report({
            type: 'behavior',
            subType: 'page-access-duration',
            startTime: performance.now(),
            pageURL: getPageURL(),
            uuid: getUUID(),
        }, true)
    })
}

Page Access Depth#

Recording page access depth is very useful, for example, comparing different activity pages a and b. If page a has an average access depth of only 50% and page b has an average access depth of 80%, it indicates that page b is more favored by users, and based on this, targeted modifications can be made to page a.

In addition, access depth and stay duration can be used to identify e-commerce fake orders. For example, some users may scroll to the bottom of the page quickly and wait for a while before purchasing, while others may scroll down slowly and then purchase. Although they have the same stay duration on the page, it is evident that the first person is more likely to be a fake order.

The calculation process for page access depth is slightly more complex:

When the user enters the page, record the current time, scrollTop value, visible height of the page, and total height of the page.
When the user scrolls the page, the scroll event is triggered, and in the callback function, use the data obtained in the first step to calculate the page access depth and stay duration.
When the user scrolls to a certain point and stops to continue viewing the page, record the current time, scrollTop value, visible height of the page, and total height of the page.
Repeat step two...

The specific code is as follows:

let timer
let startTime = 0
let hasReport = false
let pageHeight = 0
let scrollTop = 0
let viewportHeight = 0

export default function pageAccessHeight() {
    window.addEventListener('scroll', onScroll)

    onBeforeunload(() => {
        const now = performance.now()
        report({
            startTime: now,
            duration: now - startTime,
            type: 'behavior',
            subType: 'page-access-height',
            pageURL: getPageURL(),
            value: toPercent((scrollTop + viewportHeight) / pageHeight),
            uuid: getUUID(),
        }, true)
    })

    // After the page loads, initialize the current access height and time
    executeAfterLoad(() => {
        startTime = performance.now()
        pageHeight = document.documentElement.scrollHeight || document.body.scrollHeight
        scrollTop = document.documentElement.scrollTop || document.body.scrollTop
        viewportHeight = window.innerHeight
    })
}

function onScroll() {
    clearTimeout(timer)
    const now = performance.now()

    if (!hasReport) {
        hasReport = true
        lazyReportCache({
            startTime: now,
            duration: now - startTime,
            type: 'behavior',
            subType: 'page-access-height',
            pageURL: getPageURL(),
            value: toPercent((scrollTop + viewportHeight) / pageHeight),
            uuid: getUUID(),
        })
    }

    timer = setTimeout(() => {
        hasReport = false
        startTime = now
        pageHeight = document.documentElement.scrollHeight || document.body.scrollHeight
        scrollTop = document.documentElement.scrollTop || document.body.scrollTop
        viewportHeight = window.innerHeight
    }, 500)
}

function toPercent(val) {
    if (val >= 1) return '100%'
    return (val * 100).toFixed(2) + '%'
}

User Clicks#

Using addEventListener() to listen for mousedown and touchstart events, we can collect information about the size of each click area, the coordinates of the click in the entire page, the content of the clicked element, and more.

export default function onClick() {
    ['mousedown', 'touchstart'].forEach(eventType => {
        let timer
        window.addEventListener(eventType, event => {
            clearTimeout(timer)
            timer = setTimeout(() => {
                const target = event.target
                const { top, left } = target.getBoundingClientRect()

                lazyReportCache({
                    top,
                    left,
                    eventType,
                    pageHeight: document.documentElement.scrollHeight || document.body.scrollHeight,
                    scrollTop: document.documentElement.scrollTop || document.body.scrollTop,
                    type: 'behavior',
                    subType: 'click',
                    target: target.tagName,
                    paths: event.path?.map(item => item.tagName).filter(Boolean),
                    startTime: event.timeStamp,
                    pageURL: getPageURL(),
                    outerHTML: target.outerHTML,
                    innerHTML: target.innerHTML,
                    width: target.offsetWidth,
                    height: target.offsetHeight,
                    viewport: {
                        width: window.innerWidth,
                        height: window.innerHeight,
                    },
                    uuid: getUUID(),
                })
            }, 500)
        })
    })
}

Using addEventListener() to listen for popstate and hashchange page navigation events. It is important to note that calling history.pushState() or history.replaceState() will not trigger the popstate event. The event is only triggered when a browser action occurs, such as when the user clicks the back button (or calls history.back() or history.forward() in JavaScript). Similarly, hashchange works the same way.

export default function pageChange() {
    let from = ''
    window.addEventListener('popstate', () => {
        const to = getPageURL()

        lazyReportCache({
            from,
            to,
            type: 'behavior',
            subType: 'popstate',
            startTime: performance.now(),
            uuid: getUUID(),
        })

        from = to
    }, true)

    let oldURL = ''
    window.addEventListener('hashchange', event => {
        const newURL = event.newURL

        lazyReportCache({
            from: oldURL,
            to: newURL,
            type: 'behavior',
            subType: 'hashchange',
            startTime: performance.now(),
            uuid: getUUID(),
        })

        oldURL = newURL
    }, true)
}

Vue Router Changes#

Vue can use the router.beforeEach hook to listen for route changes.

export default function onVueRouter(router) {
    router.beforeEach((to, from, next) => {
        // No need to count for the first page load
        if (!from.name) {
            return next()
        }

        const data = {
            params: to.params,
            query: to.query,
        }

        lazyReportCache({
            data,
            name: to.name || to.path,
            type: 'behavior',
            subType: ['vue-router-change', 'pv'],
            startTime: performance.now(),
            from: from.fullPath,
            to: to.fullPath,
            uuid: getUUID(),
        })

        next()
    })
}

Data Reporting#

Reporting Methods#

Data reporting can use the following methods:

Using sendBeacon for reporting has clear advantages.

The sendBeacon() method allows the user agent to asynchronously send data to the server when it has the opportunity, without delaying the unloading of the page or affecting the loading performance of the next navigation. This solves all the problems of submitting analytical data: data reliability, asynchronous transmission, and no impact on the loading of the next page.

In browsers that do not support sendBeacon, we can use XMLHttpRequest for reporting. An HTTP request consists of two steps: sending and receiving. In fact, for reporting, we only need to ensure that it can be sent out. That is, as long as the sending is successful, it is fine, regardless of whether the response is received. To this end, I conducted an experiment, sending 30kb of data (generally, the data to be reported is rarely this large) using XMLHttpRequest in the beforeunload event, and it successfully sent in different browsers. Of course, this is also related to hardware performance and network status.

Reporting Timing#

There are three reporting timings:

Use requestIdleCallback/setTimeout for delayed reporting.
Report in the beforeunload() callback function.
Cache reporting data and report after reaching a certain quantity.

It is recommended to combine the three methods for reporting:

First, cache the reporting data, and after caching to a certain quantity, use requestIdleCallback/setTimeout for delayed reporting.
When leaving the page, report all unreported data uniformly.

Front-end Monitoring Deployment#

The previous discussion was about monitoring principles, but to implement it, you still need to write code yourself. To avoid trouble, we can use existing tools like sentry to do this.

Sentry is a performance and error monitoring tool written in Python. You can use the services provided by Sentry (with fewer free features) or deploy your own service. Now let's see how to use Sentry's services to implement monitoring.

Summary#

With the development of web technology, the scale of front-end projects is also increasing. With the help of monitoring systems, we can better understand the operational status of projects and make targeted optimizations based on the collected error and performance data.