Attribution Problems in SPAs: Solutions and Caveats

Markus Baersch
9 min readJun 6, 2021

Modern websites are often built as a Single Page Application (SPA). This is especially the case in ecommerce. Instead of the old “here is your page, ask for the next one whenever you like” approach, SPAs may still load content from a server in order to react to visitor actions, but this happens in the background and the existing DOM is changed without loading a complete new page source code and start all over again with every change of url / location. Still, most SPAs are navigable by the browser’s back button. For a visitor this technological difference has few to no real effect. But from a client-side tracking perspective, a clean setup can still be a big challenge.

Background

SPAs usually never experience a real page load second after the visitor enters the site. All succeeding events and pageviews are triggered by some kind of dataLayer event, so a tracking setup (like GTM) can use these events as a replacement for the usual multiple Container Loaded, DOM Ready or Page Loaded trigger points.

One of the main differences between tracking of SPAs vs “regular” websites is the fact that the referrer information never changes when internal links do not lead to the classic request/response browser behaviour.

While all following pageviews of a normal website contain the preceding page url as referrer, in SPAs the original referrer remains unchanged.

Reloading a page inside SPAs manually does not change that. And document.referrer is a read-only attribute, so a simple reset is impossible. Or — to be precise — you can set a new document.referrer for the existing page with some effort, but that does not solve the problems that arise with a reload.

This is the reason for one of the most impactful problems, when measuring SPAs with Google Analytics: attribution. Hitting the right bucket for a visit or conversion suddenly can be a difficult task. Besides many others like handling user consent, working with Google Optimize, building reliable triggers (or using trigger groups)...

Impact on Attribution

SPA behaviour can lead to several problems, when it comes to attribution of a visit to the right / desired traffic source. Whenever a referring site uses identifiers like UTM parameters or click ids like a Google Ads gclid, the referrer should not define source and medium, but the parameter of the entrance url. Every reloading event in a SPA on a different URL than the one at the visitor’s entrance might lead to a new session that gets attributed to the “surviving” referrer information. Because the original URL identifiers (which defined the source and medium for the original session) are not available anymore, but the referrer is still the same. Simo Ahava dubbed this issue the “rogue referral problem” a while ago.

Typical Solutions

Typical “rogue referral” solutions either

  1. control the page and location parameters and / or
  2. unset (meaning: discard) the referrer after the first hit

In case of a “hard reload”, the still existing referrer would be ignored in the second approach. In order to achieve this, the referrer field of Google Analytics hits is populated by a custom JS variable or a customTask is used to achieve the same goal: Sending the referrer once, persisting the referrer information and discard the dr parameter it in all following outgoing tracking hits as long as the referrer matches the persisted and already sent url.

The first — and common — approach uses location and page GA configuration fields or a customTask to preserve the original entrance location with all parameters and just change the page information, when the visitor navigates to other URLs inside the SPA. Simo Ahava (who else?) covered this problem and possible solutions on his blog and even provides a Custom Tag Template for GTM to persist campaign data.

Persisting referrer or original location data for a “session” can be achieved in a variety of ways in SPAs. While localStorage, sessionStorage or cookies can be used by both SPAs and regular sites, the “single page” nature allows additional options, because “the page” and its resources usually never go away. So storing something in the dataLayer or any other JavaScript variable will keep it available on succeeding virtual pageviews (changes of page value).

Comparison

  • every storage that exclusively relies on the dataLayer or JS variables will fail, whenever a page reload or loading of a new page occurs. And as a referrer does not go away in SPAs, a new session in GA with a possibly new source (if the mentioned identifiers in the entrance url were used) is the result. A mix of SPA and Non-SPA parts of a site or several subdomains make this solution even more vulnerable, because additional page loads after landing on the site are simply inevitable in some cases. Adding the mobile context, where reloading a page by the browser after switching apps is common practice.
  • Storing data in localStorage or persistent cookies needs some kind of limited lifetime, so that a real new session does not get misattributed
  • sessionStorage or session cookies comply with the browser’s session concept. This does not match a session in Universal Analytics or its equivalent in GA4.

Using something that outlasts a page load solves sole problems, but brings some new challenges. Limited storage lifetime, like 30 minutes to match the standard session duration in GA, has one crucial disadvantage when applied to a referrer in order to determine if the current referrer should be sent or not: the document.referrer information might live longer, if the tab is never closed, but a reload is forced — or the browser restores tabs after being closed. Especially in mobile use-cases this is likely to occur during a session, in particular when a visitor uses Safari.

Here is what happens, when a referring site (markus-baersch.de) provides a link to a SPA page with UTM parameters and a visitor clicks it, visits the SPA and navigates to a second page on a mobile device using Google Chrome (Dev-Tools via Remote Debugging):

The console shows the location of the first page, the referrer and — after navigating the site — a different location. The current location does not contain UTM parameters anymore, the referrer stays the same.

If the tab is not just inactive for a while, but the whole browser is closed (e.g. using the task manager)….

… and then the browser is restarted and restores the last tab by reloading the page. The initial referrer is unfortunately still present:

Everything in the dataLayer or JS variables from the first page load will be gone at this point. If a solution relies on this kind of storage, the new session starts and gets attributed to the referrer — regardless of time elapsed between the initial entrance and the reloading event. However, there is one exception: if the referring domain is included in the referral exclusion list in GA and the session still lives, following hits can be counted towards the previous session.

Same goes for sessionStorage in this case, because the browser was closed and the reloaded tab will not be able to access the referrer (or location) from the browser storage, that was reset due to a new session from a browser perspective. The existing referrer is used and a new session most likely starts with this source information.

Would the last sent referrer still be known to the application (from localStorage or a persistent cookie for example), tracking can skip sending the referrer (or reuse the original entrance location). This may still lead to a new session (if the last one has timed out) but it will be a direct / none session, if the referrer is the weapon used to fight rogue referrals and the reloaded page url differs from the landingpage for the previous session and so the identifiers are missing. With UTMs or a click id present in the url, the attribution will match the first one.

In case of a persisted original location the new session will be attributed to the source from identifiers in the URL as well… if there were any. If the first session was a real referral visit and only the location was stored, a direct session instead of the desired referral emerges after the reload.

Mixed Environments

An additional aspect to consider: If your solution is based on browser storage and the site consists of several (Non-)/SPAs on different subdomains like a separate blog, website and shop, linking between the parts will usually lead to a complete reload of the entry page, when changing from one app to another. As localStorage has to obey the Same Origin Policy that allows access to stored items only if hostname, port and protocol stay the same, the referring subdomain will be the document.referrer for the target SPA initial page load on a different subdomain and thus a new (“internal”) referrer is sent and stored once with every change of domains. Add external payment providers in SPA online shops to the game, if this seems too much like a constructed case.

While this may be not a problem for the existing session thanks to the referral exclusion list in Universal Analytics or the automatic self-referral exclusion in GA4, a new session with the last internal referrer can occur, whenever the old session timed out or the initial referrer (stored in a browser storage key for a different subdomain) is lost. For Example, when a tab is restored like demonstrated above on any other subdomain than the initial one.

Store What? Where? For How Long?

Storing both location and referrer can solve some of the problems. But how long should those references of the initial visit live?

The usual 30 minutes of storage lifetime prevent the browser from holding back the referrer in cases, where it might not be appropriate any longer. A second visit from the same source on the following day for example should be attributed correctly and not count as a direct session, just because a legit referrer did not get sent to GA.

But if the time span between storage and reloading the page is just outside the 30 minute range, the initial document.referrer gets resent and the new location may not contain the original identifiers. Hence a new session originates and is attributed to the referring website instead of the utm source for example.

As an edge case, storage could live “forever” (or at least for the duration of the longest conversion window) and only gets updated if a new referrer (different url or undefined) or identifier occurs. This would lead to a maximum of “correctly” attributed conversions and transactions to the last known source, even without the Last Non-Direct Click Attribution in Google Analytics. But if attribution of visits — and not conversions — to the correct source is of some value and a real direct session should be added to the corresponding channel, unlimited lifetime of attribution information may be counterproductive.

Finding a Sweet Spot

Therefore, the “best” solution depends on how important it is to attribute success — or visits — most accurately to the “real” initial source of an interrupted session. 30 minutes may not be enough time for every visitor to reach a goal or make a transaction after “pausing” a visit, even if this span matches the standard session timeout in GA.

Still localStorage (single SPA / domain) or persistent cookies (for all mixed cases) seem to be the right place to store such information. If possible, examine your existing data for a balanced attribution in order to find out how much time typically lies between a google / cpc and a following google / organic session for the same clientId with success in the second session (that should be attributed to the first one instead). With this information, a “most accurate” attribution can be achieved by keeping the second sessions at direct / none (discarding all identifiers and referrer information)… or by reusing the initial referrer and location, so that the second session will originate from the same source than the preceding one. However, a perfect solution that exactly matches multi page site tracking behaviour in all cases seems impossible.

That makes consideration of these aspects even more important, when choosing and implementing a solution. All of them might have some side-effects that were never taken into account. Looking for breaking sessions that switch from Paid to Organic or from UTM tagged campaigns to Referral might still be a good idea, even if some kind of solution is already in place.

--

--