Wall-modeled lattice Boltzmann large-eddy simulation of neutral atmospheric boundary layers pressure-based

The lattice Boltzmann method (LBM) sees a growing popularity in the ﬁeld of atmospheric sciences and wind energy, largely due to its excellent computational performance. Still, LBM large-eddy simulation (LES) studies of canonical atmospheric boundary layer ﬂows remain limited. One reason for this is the early stage of development of LBM-speciﬁc wall models. In this work, we discuss LBM–LES of isothermal pressure-driven rough-wall boundary layers using a cumulant collision model. To that end, we also present a novel wall modeling approach, referred to as inverse momentum exchange method (iMEM). The iMEM enforces a wall shear stress at the off-wall grid points by adjusting the slip velocity in bounce-back boundary schemes. In contrast to other methods, the approach does not rely on the eddy viscosity, nor does it require the reconstruction of distribution functions. Initially, we investigate different aspects of the modeling of the wall shear stress, i.e., an averaging of the input velocity as well as the wall-normal distance of its sampling location. Particularly, sampling locations above the ﬁrst off-wall node are found to be an effective measure to reduce the occurring log-layer mismatch. Furthermore, we analyze the turbulence statistics at different grid resolutions. The results are compared to phenomenological scaling laws, experimental, and numerical references. The analysis demonstrates a satisfactory performance of the numerical model, speciﬁcally when compared to a well-established mixed pseudo-spectral ﬁnite difference (PSFD) solver. Generally, the study underlines the suitability of the LBM and particularly the cumulant LBM for computationally efﬁcient LES of wall-modeled boundary layer ﬂows.


I. INTRODUCTION
Large-eddy simulation (LES) has become one of the most widely used numerical methods for the study of the atmospheric boundary layer (ABL). 1 In LES, the large energy-containing turbulent structures of the flow are explicitly resolved, while only the effects of sub-grid scales (SGS) are parametrized. Numerous fundamental investigations of turbulent fluxes in the ABL have been conducted using LES ever since the pioneering works by Smagorinsky 2 and Lilly. 3 Prominent aspects of investigation are processes in the stable and convective boundary layer, diurnal variations as well as flows over plant canopies or complex terrain. 1 Other, more engineering-driven applications can be found in the fields of urban flows 4,5 and wind energy. 6 Arguably, the most important factor facilitating the growing use of LES in the aforementioned areas has been the increasing availability and efficiency of computational resources. This not only led to a wider adoption of the LES technique. It also facilitated a continuous growth of the size of computed problems. 1,7 Still, the requirements concerning computational resources keep growing along with their availability, and the high computational cost remains a main bottleneck for the application of the method. One manifestation thereof are the ongoing discussions on grid resolution requirements for well-resolved LES. A common starting point has been the widely used criterion introduced by Pope, 8 who suggested that at least 80% of the total turbulence kinetic energy (TKE) should be resolved. Later, Davidson 9 or Wurps et al., 10 for instance, advocated for more rigorous criteria based on two-point correlations requiring notably finer grids. Others showed that an adequate reproduction of high-order moments requires grid resolutions well above the choices found in many boundary layer studies. [11][12][13] On that account, Bou-Zeid 7 argued for "a push of LES outside its comfort zone." In other words, the author demands for a correct reproduction of statistics beyond second order which eventually requires larger amounts of grid points. As for more applied research or industrial applications, requirements concerning the convergence of high-order statistics are potentially less strict as the focus often lies on a few key design parameters. 14,15 Also, some statistics can simply be less relevant to the physics of interest. See, for instance, the discussions on the effects of non-Gaussian statistics on wind turbine loads. [16][17][18] For industrial applications, the sheer runtime poses a more critical performance criterion. For instance, L€ ohner 19 argues that a widespread adoption of LES in the industrial practice hinges on the feasibility of over-night runs. Others even strive for LES as a forecasting tool, such as for pollutant dispersion [20][21][22] or wind farm performance, 23,24 and thus require computational performances beyond real-time. In summary, increasing computational efficiency remains a key objective for ongoing research for both fundamental and industrial atmospheric applications of LES.
One promising approach to increase the computational performance of LES is the lattice Boltzmann method (LBM). The LBM numerically solves the simplified Boltzmann equation (BE) which can be shown to converge to the solution of the weakly compressible Navier-Stokes equations in the limit of low Mach and Knudsen numbers. [25][26][27] The decisive factor for the excellent computational performance is the simplicity of the numerical scheme, characterized by an explicit time-stepping, the locality of all non-linear terms, and a direct advection (streaming) requiring no interpolation. Furthermore, these characteristics render the LBM particularly suitable for implementations on GPUs (graphics processing units). In various fields, GPUbased LBM codes have been shown to push the scope of LES computations due to their low cost to performance ratios. 19,28,29 This also includes applications in the wider field of atmospheric sciences, such as urban 20,22,30,31 or canopy flows, 32 wind energy 24,33,34 or flows over complex terrain. 35 Nevertheless, lattice Boltzmann LES of canonical ABL flows are still rare. To our best knowledge, only Feng et al. 36 presented such simulations using a hybrid recursive regularized LBM approach coupled to a finite volume solver for the temperature and other scalars. One reason for the novelty of these applications is the necessity of wall models. After all, the first LBM-specific wall-modeling approaches only emerged in the mid-2010s (see Refs. [37][38][39]. Still, the ongoing work in the field of LBM wall models indicates the perpetual need for further developments. 40 Similarly, crucial developments of collision operators suitable for high Reynolds numbers date back to the same period. See, for instance, Refs. 41-43. In this study, we explore the potential of the lattice Boltzmann method (LBM) for LES of neutral ABL flows. The analysis is based on a series of open-channel flow simulations (disregarding Coriolis forces). More specifically, we employ a cumulant collision model in conjunction with an anisotropic minimum dissipation (AMD) SGS model. Furthermore, we propose a novel LBM wall-modeling approach that is coupled to classical bounce-back boundary schemes. The proposed wall model seeks to address a number of deficiencies of existing approaches while retaining numerical simplicity. The presented numerical framework is analyzed by means of the canonical test case of a pressure-driven isothermal rough-wall boundary layer capped by a solid lid.
The remainder of the paper is organized as follows: In Sec. II, we provide a brief introduction to the LBM, followed by a description and motivation for the cumulant collision model as well as the utilized SGS model. In Sec. III, we first review the most prominent existing wallmodeling approaches for the LBM. We then introduce the LBMspecific aspects of the new wall model together with the details for the modeling of the wall shear stress. Further details upon the numerical setup and simulated test case are given in Sec. IV. In Sec. V, we briefly investigate the impact of the wall shear stress model. Using the most promising wall shear stress model, a more detailed grid sensitivity study is presented in Sec. VI. This includes a thorough comparison of various turbulence statistics to phenomenological scaling laws as well as numerical and experimental references. A final discussion and concluding remarks are given in Sec. VII.

II. THE LATTICE BOLTZMANN METHOD
The fundamental variable of the LBM is the particle distribution function f. Particle distribution functions describe the occurrence of particle densities with a certain velocity in space and time. The transport equation of f is the Boltzmann equation (BE). In the LBM, the velocity space is reduced to a finite set of discrete velocity directions, the velocity lattice. By means of the method of characteristics, the BE can be integrated along each discrete velocity direction resulting in the lattice Boltzmann equation (LBE), where x ¼ ðx; y; zÞ and t refer to space and time, respectively, and e ijk is the lattice vector. In the presented study, we employ a D3Q27 lattice 44 with e ijk ¼ ðic; jc; kcÞ; i; j; k 2 fÀ1; 0; 1g fÀ1; 0; 1g fÀ1; 0; 1g; (2) where c ¼ Dx=Dt is the lattice velocity. From Eqs. (1) and (2), it can be inferred that distribution functions move exactly from one lattice node to another during one time step Dt. Equation (1) is thus commonly split into a local collision step, providing the post-collision distribution function f Ã ijk , followed by the advection or streaming step, The collision operator X ijk on the right-hand side of Eq. (1) describes the redistribution of particle densities through particle collisions. Further details thereupon will be given in Sec. II A. Macroscopic quantities can be obtained from the raw velocity moments of f ijk , viz., For instance, the macroscopic density q is given by the zeroth-order moment m 000 while the first-order moments m 100 , m 010 , and m 001 refer to the momentum in the three coordinate directions qu, qv, and qw, respectively. The macroscopic pressure p is related to the fluid density via the isothermal equation of state, with c s being the speed of sound.

A. The cumulant collision model
The classical collision operator for the LBM is based on the Bhatnagar-Gross-Kroog (BGK) collision model. 52 The BGK collision model directly relaxes the distribution functions toward an equilibrium f eq ijk ðu; qÞ using a single relaxation time s. While being particularly simple, a crucial insufficiency of the BGK model is numerical instabilities at high Reynolds numbers. As a consequence, numerous alternative collision models have been presented over the past two decades; see, e.g., Refs. 25,42,53,and 54. In this study, we employ the cumulant collision model (hereafter referred to as CLBM) introduced by Geier and coworkers. 27,43 The CLBM is based on a relaxation of cumulants of the particle distribution functions. Cumulants possess the favorable features of frame invariance and statistical independence. A relaxation in cumulant space thus eliminates various deficiencies of MRT model, such as a lack of Galilean invariance and the introduction of hyper-viscosities. 27 Based on the moment generating function of the pre-collision particle distribution functions, f ðnÞ e ÀNÁn dn; with N ¼ fN; !; Zg denoting the particle velocity n ¼ fn; t; fg in wave number space, cumulants c abc can be obtained as Subsequently, cumulants are relaxed toward their respective equilibrium c eq abc using a relaxation rate x abc , c Ã abc ¼ x abc c eq abc þ ð1 À x abc Þ c abc : Finally, the post-collision cumulants c Ã abc are transformed back to particle distribution space and advected.
Via an asymptotic analysis, 27 it can be shown that the collision rate of the second-order cumulants (a þ b þ c ¼ 2) x 1 relates to the kinematic viscosity according to As for the relaxation rates of higher-order cumulants, we apply the parametrization presented in Ref. 43. The parametrization provides fourth-order spatial accuracy in diffusion. Stabilizing fourth-order limiters as suggested in Ref. 43 were not employed due to the use of an eddy-viscosity SGS model.

B. Sub-grid scale model
A common approach for LBM-LES is the adoption of eddyviscosity SGS models as found in classical LES based on the filtered Navier-Stokes equations (NSE). 55 To that end, the molecular viscosity in Eq. (10) is replaced by the total viscosity, where t is the eddy viscosity. Typically, t is computed from gridfiltered macroscopic quantities [hereafter denoted by ð~Þ] in direct analogy to the original descriptions. [56][57][58] In the presented work, we apply the anisotropic minimum dissipation (AMD) model by Rozema et al. 59 Minimum dissipation models aim for an eddy viscosity that provides the minimum eddy dissipation required to prevent the accumulation of sub-grid scale kinetic energy within a filter box X b . In the AMD model, the upper limit of the SGS kinetic energy is approximated by the Poincar e inequality, where C i is the Poincar e constant. The eddy viscosity is given by 3g being the scaled gradient operator andS ij being the filtered strain rate tensor. For further details, we refer to Refs. 59 and 60. In the CLBM, the strain rate tensor is readily available in the collision operator by means of the second-order cumulants. The remaining components of the velocity gradient tensor are computed by second-order finite differences of the resolved velocity. The AMD model thus comes at a comparable computational cost as a standard Smagorinsky model. Furthermore, the locality makes it particularly suited for memory-bound hardware, such as GPUs. In contrast, most dynamic eddy-viscosity approaches entail spatial and/or temporal filtering that bring about notable performance losses. Despite its simplicity, recent studies highlight the AMD as a cost-competitive alternative to dynamic Smagorinksy models for ABL simulations. 61,62 III. WALL MODELLING Due to the very large Reynolds numbers encountered in ABL flows, a parametrization of the bottom surface becomes imperative. It is, therefore, a common practice in LES of ABL flows to replace the no-slip boundary condition at the surface with a stress boundary condition prescribing a wall shear stress s w via Monin-Obukhov similarity theory (MOST), where u is the stream-wise velocity, u Ã ¼ ffiffiffiffiffiffiffiffiffiffi s w =q p the friction velocity and / M the momentum stability correction. 1 To that end, wall models sample the bulk velocity u el ¼ uðx el Þ at a sampling point x el , commonly referred to as exchange location, 63,64 with a corresponding wall-normal distance z el . From u el and the wall-normal unit vector n, the wall-tangential stream-wise velocity as well as the corresponding basis vector can be obtained as and respectively. Inserting u wm ¼ jju el;t 0 jj and z el , Eq. (14) can be solved for s w , and the stress at the wall can be prescribed as A more detailed discussion about optimal values for z el as well as additional filtering or averaging of u el will be given in Sec. III B 2.
In the LBM, the topic of wall modeling only emerged recently with the first study being due to Malaspinas and Sagaut. 37 Ever since, a handful of applications of wall-modeled LBM-LES have been presented. 38,39,[65][66][67][68] Still, the topic of wall modeling in the LBM arguably remains at an early stage when compared to classical LES. 40 Furthermore, it is noticeable that the few wall models presented in the literature differ considerably in how the wall stress is prescribed at the boundary. In the following, we will first give an overview of existing wall modeling approaches in the LBM. In Sec. III B 1, we will then introduce a new approach. A summarizing illustration of all discussed models is given in Fig. 1. Note that we follow the common coordinate convention in ABL flows, i.e., x, y, and z being the stream-wise, lateral, and wall-normal direction, respectively.

A. Recent advances in LBM wall modeling
The diversity of existing LBM wall models largely originates from the differences between the boundary condition approaches they are based on, i.e., so-called wet-node boundary conditions (WNBCs) and bounce-back schemes (BBSs), respectively.

Wall models based on wet-node boundary conditions
Generally, WNBCs seek to reconstruct missing particle distribution functions at the first fluid node off a solid boundary. [70][71][72] The initial wall model by Malaspinas and Sagaut 37 as well as later developments thereof 39,66,68,73 are based on WNBCs. In brief, a wall stress s w is obtained using a near-wall velocity function like Eq. (14) and the velocity sampled at an exchange location with z el > z 1 . The velocity u 1 at the first fluid node x 1 can then again be computed from where z 1 is the wall normal distance to x 1 . With the density q 1 ¼ qðx 1 Þ interpolated from the surrounding fluid nodes, f ðx 1 Þ is reconstructed as follows: where the non-equilibrium distribution f neq can be obtained either from the macroscopic stress tensor at x 1 or from an extrapolation of the non-equilibrium distributions of inward fluid nodes, such as x 2 . 67 A schematic of the approach is given in Fig. 1(a). The aforementioned studies have shown that WNBC-based wall models can provide a robust framework for turbulent boundary layer simulations. Yet, spurious oscillations near the wall due to the interpolation of u el and the macroscopic quantities at the boundary node appear to be a persistent issue. 39,68,74 Furthermore, it should be pointed out that the approach inherently forbids the occurrence of resolved shear stress at the first off-wall node x 1 since f ðx 1 Þ is solely reconstructed using wall-tangential velocity components. This also reflects in the choice of RANS turbulence models in the near-wall region wall in conjunction with this wall model. 39

Slip-velocity-based wall models using bounce-back schemes
BBSs are based on the idea that distribution functions f BB ijk propagating from a location x BB toward a solid wall are reflected and therefore reappear at the boundary node as f ijk , with e ijk ¼ Àe ijk ; thus, where f uw ijk accounts for the additional momentum exchange between the wall and f BB ijk due to a wall velocity u w . One concept for wall models based on BBSs originally refers to the Neumann condition proposed in Ref. 75, depicted in Fig. 1(b). Here, a wall-normal stress r nt 0 is

ARTICLE
scitation.org/journal/phf imposed based on a first-order approximation of the wall-normal gradient of the stream-wise velocity, Given a wall shear stress r nt 0 , Eq. (21) can be solved for u w and inserted in Eq. (20). 75 Nishimura et al. 65 combined this approach with Spalding's law to compute r nt 0 . Furthermore, was substituted with the total viscosity T in order to account for the shear due to the eddyviscosity t . Unfortunately, no extensive analyses of the approach in boundary layer flows have been presented. However, the dependency of u w on t appears problematic. Eventually, SGS models with appropriate near-wall scaling tend to predict near-zero eddy viscosities at the wall, hence, leading to very large velocity gradients and possibly reversed wall velocities. Problems of this type have been reported for slip-velocity-based wall models in classical finite-volume Navier-Stokes approaches. 76

Immersed virtual wall method
Another approach is the recently proposed immersed virtual wall (IVW) method. 40 Here, the numerical solid boundary is immersed in the wall, separated from the actual wall by fluid nodes of a virtual wall layer. Figure 1(c) illustrates the concept of this virtual wall layer with an exemplary thickness of Dx. The first off-wall nodes x 1 thus do not border solid nodes. On the other hand, a standard no-slip boundary condition is applied to the fluid nodes at the immersed virtual wall following a standard BBS. In order to enforce the wall shear stress predicted by a wall function a body force F wm is applied at x 1 . To that end, the viscous shear force between x 1 and the virtual wall layer F is computed via the momentum exchange method (MEM). F wm then accounts for the difference between F and the force due to the predicted wall shear stress.

B. A new wall model
The large variety of LBM wall models as well as ongoing debates of the topic indicate that no approach was yet found to be optimal for all applications of wall-modeled LBM-LES. In this work, we seek for an equilibrium-stress model that imposes the wall shear stress at the boundary using MOST and the resolved velocity sampled in the surface layer, similarly to the classical LES. 77

The inverse momentum exchange method
Our aim is to impose a desired wall-stress via a manipulation of the wall velocity u w in BBSs, while avoiding a dependency on the eddy viscosity as in Ref. 65. Starting point of the approach is the MEM. In the MEM, the momentum exchanged at a fluid-solid link during the advection step is given by where f ijk will depend upon the BBS. [78][79][80][81] The total force exchanged at the fluid-solid interface of x 1 thus amounts to with C being the set of lattice directions intersecting the solid boundary. Inserting Eqs. (20) and (22) into Eq. (23), we can separate the total force into a part directly related to the bounce-back of particle distributions F f and a part related to the wall velocity F uw , respectively: In the wall-modeled LES, we intend to prescribe the local wall shear stress s w based on wall functions like MOST. This implies that where A 1 is the surface area intersecting the unit cell of x 1 . To this end, we propose to invert the MEM, i.e., we do not compute the resulting force of a bounce-back given F f and a certain u w , but we incorporate the MEM in the bounce-back scheme in order to set a wall velocity u w that satisfies Eq. (27). We shall hereafter refer to this approach as the inverse momentum exchange method (iMEM). More specifically, we initially compute the force exerted by a wall at rest F f following Eq. (24). Substituting F with F wm in Eq. (26), we can now solve for a wall velocity u w that satisfies (20)] completing the modified bounce-back scheme. From Eq. (25), it becomes obvious that the required expression t w ðx 1 ; F uw Þ ¼ u w depends on the utilized lattice as well as C. Hence, the explicit form of t w can differ between different boundary nodes in the case of complex geometries. Still, the system of equations to be solved for u w remains determined as long as F uw can be constructed as a linear combination of e ijk ; ijk 2 C. For simple grid-aligned boundaries, the former does obviously not apply. As for the specific case of this study, namely, a flat surface with wall normal vector n ¼ Àe z and a D3Q27 lattice, we find The entire wall modeling approach for each respective boundary node x 1 can be summarized as follows: Pre-processing 1. Determine t w and A 1 .
2. Identify nodes for the interpolation of u el at x el .
2. Compute f BB ijk following a BBS. 3. Compute F f and F wm using Eqs. (24) and (27), respectively. 4. Compute u w with t w ðx 1 ; F uw Þ. 5. Add f uw ijk to f BB ijk and update f ijk . As shown above, the iMEM wall modeling approach can be easily incorporated into existing BBS as it merely requires the additional Physics of Fluids ARTICLE scitation.org/journal/phf computation of u w . Furthermore, it avoids any dependency on t since s w is imposed without a preceding approximation via the wall-normal velocity gradient.

Estimating the wall shear stress
The iMEM and other methods outlined above chiefly differ in how the wall shear stress is incorporated in the boundary condition. Another crucial component of a wall model is the estimation of the wall shear stress itself. Two main aspects come into play, namely, the wall normal distance of the exchange location z el and the filtering/ averaging of u el .
As for the former, the most obvious choice is to sample u el at the first off-wall node (z el ¼ z 1 ), see, e.g., Refs. 77 and 82. In the first place, this is arguably desirable since the velocities further off the wall can be increasingly uncorrelated with the surface fluxes, compromising the validity of the wall function. This can particularly be the case for stratified flows, complex geometries or, for instance, in the presence other boundary layer perturbations, such as wind turbine wakes. On the other hand, the turbulent fluxes close to the wall are typically underresolved and therefore prone to numerical and SGS-modeling errors. These surface layer modeling errors are thus additionally coupled to the outer layer flow via the wall model evoking problems, such as loglayer mismatch (LLM). Kawai and Larsson, 63 therefore, argued to sample u el at the second off-wall node or beyond. This practice has been adopted across various numerical models 64,83,84 including most of the aforementioned wall-modeled LBM approaches.
If and how to filter or average u el has been discussed in the literature with similar intensity. In the context of ABLs, the general necessity to average u el originates from the fact that MOST was developed and validated in an averaged sense. Yet, the velocity u el sampled at the exchange location fluctuates. Using instantaneous local values thus leads to an overprediction of the mean shear stress as can be inferred from Schwartz inequality (hũ 2 i > hũi 2 , where hÁi denotes a spatial averaging). 77 A common method to overcome this problem is the Schumann-Gr€ otzbach (SG) model. 85,86 Here, an average wall shear stress hs w i is computed based on the planar-averaged velocity huðz 1 Þi. The wall model then applies a local instantaneous deviation from the average shear stress that is approximated as a linear function of the local resolved velocity, s w ðx; y; tÞ ¼ hs w iũ ðx; y; z 1 ; tÞ huðz 1 Þi : Recently, Maronga et al. 84  Among others, the studies mentioned above show that the overall sensitivity to z el and the averaging of u el , respectively, can be largely dependent on the numerical scheme. In Sec. V, we discuss the impact of some of the aforementioned approaches for the wall shear stress estimation, namely, the SG and ESG approach. Yet, as opposed to the original methods we replace the planar averaging in Eq. (29) with a temporal averaging using Eq. (30). This modification thus retains the aforementioned benefits of using averaged quantities while being local in space.

IV. NUMERICAL SETUP AND CASE DESCRIPTION
For the simulations discussed in this work, we employ a GPU-based LB framework. 89 All floating point operations are performed in single-precision. Distribution functions are formulated in their well-conditioned form f wc ijk ¼ f ijk À w ijk in order to reduce the sensitivity of the scheme to round-off errors. 27,57 The forward and backward cumulant transformations closely follow the original description as in Ref. 27.
We simulate the canonical test case of a simplified neutral ABL in an open-channel flow setup, i.e., an isothermal turbulent boundary layer above a horizontally homogeneous rough surface. Coriolis forces are neglected for this case, and the flow is driven by an imposed constant pressure gradient @p=@x ¼ qu 2 Ã =H, where H ¼ 1000 m is the domain height and u Ã ¼ 0:4 ms À1 . The pressure gradient thus determines the equilibrium wall shear stress and the associated velocity length scale u Ã . The roughness length is set to z 0 =H ¼ 10 À4 . Similar to Refs. 90 and 91, the domain measures L x ¼ 6 H; L y ¼ 4 H and L z ¼ H in the stream-wise, lateral, and vertical direction, respectively. Periodic boundary conditions are employed in x and y. A zero-stress no penetration boundary condition (@ũ=@z ¼ @ṽ=@z ¼ 0;w ¼ 0) is applied at the top. The latter is realized as a simple bounce-forward scheme. 51 At the bottom, a SBB is applied that is coupled to the iMEM wall model as described in Sec. III B 1. Sub-grid scales are modeled by the AMD model. The model constant is set to C x;y;z ¼ 1=12. All simulations are run at a Mach number Ma ¼ 0:1.
The flow field is initialized with the equilibrium mean velocity distribution including sinusoidal perturbations inw. Each case is runup for 50 eddy-turnover times T Ã ¼ H=u Ã . Subsequently, data are collected over another 150 T Ã while full snapshots of the domain are taken every 0:2 T Ã .

V. IMPACT OF WALL SHEAR STRESS MODEL
The impact of the different methods to calculate the wall shear stress is compared at a moderate spatial resolution of n z ¼ 96 grid points along the vertical. In the comparison, we include the vanilla version of the wall model with z el ¼ z 1 and no averaging of u wm . In line with Ref. 84, the latter will hereafter be referred to as the instantaneous logarithm method (IL). Furthermore, we consider the SG and ESG model with a filter timescale T f ¼ 10 Dt c . The exchange location in the ESG model is set to z el ¼ 2 Dz.

A. Log-layer mismatch
The mean stream-wise velocity hũi, the non-dimensional velocity gradient / ¼ jz The main difference is a minor increase in the entire velocity profile while the velocity gradient remains almost unaffected. Computing the wall shear stress from the instantaneous velocity typically entails an overprediction thereof as outlined in Sec. III B 2. Given a fixed driving pressure gradient, the underprediction of the velocity at the exchange location (i.e., the first grid point) with the IL can thus be inferred from the argument of global momentum conservation. In turn, as the velocity underprediction with the IL is small, it can be expected that the SG model has only little effect on the velocity profile. Indeed, the effectiveness of filtering/averaging in the context of wall models appears highly model-dependent. Most of the studies mentioned in Sec. III B 2 do report beneficial effects on problems, such as LLM and the gradient overshoot. In contrast, others show negligible effects, similarly as here, and therefore even refrain from averaging despite the theoretical arguments for it. 83,88 Based on the theoretical arguments for the averaging, it can be expected that its effectiveness, and hence the need for it, will correlate with the amount of resolved turbulence at the exchange location: the higher the stream-wise velocity fluctuation at the exchange location, the higher the wall shear stress overprediction without averaging. When compared to other LES results shown in Sec. VI, the resolved shear stress at the first grid point [see Fig. 2(c)] does indeed appear rather low. With the ESG, the entire velocity profile is shifted downward in comparison with the IL. The LLM in the surface layer (z=H Շ 0:1) is thus significantly reduced in exchange for a notable underprediction of the velocity at the first grid point. The velocity gradient, on the other hand, is again only marginally affected. This refers to both the nearwall overshoot and the consistent overestimation in the surface layer. Small deviations from the log-profile therefore inevitably remain even above the first grid point. In that sense, the effect of the ESG model is well in line with the original motivation by Kawai and Larsson 63 to sample beyond the first grid point. Namely, the approach does not resolve the underlying problem for the LLM, i.e., a poorly resolved flow at the first grid point bringing along the observed overshoot in the velocity gradient. But, it can be an effective measure to reduce the LLM in the bulk.

B. Statistics of the wall shear stress
The probability distribution functions (PDFs) of the instantaneous predicted friction velocity of the wall model u wm Ã are compared in Fig. 3. The corresponding mean l, standard deviation r, skewness S and flatness (kurtosis) F are provided in Table I. The mean l of the IL lies below the theoretical equilibrium value of u Ã . At the same time, it lies above the peak of the PDF due to the distinct positive skewness of the distribution. The flatness is close to 3 implying near-Gaussian occurrences of extreme values. With the SG the mean is close to u Ã and the distribution is less skewed. The flatness is slightly lower than for the IL but still close to Gaussian. The ESG leads to a further decrease in the skewness along with a lower flatness. The observed trends in the statistics of u wm Ã are generally well in line with other studies of the same or similar wall stress models. See, Refs. 84 and 92. One explanation for the positive skewness of the IL can be its linear dependency onũ 1 which itself is typically positively skewed (see, for instance, Fig. 7). The filtering of the SG and ESG reduces the

ARTICLE
scitation.org/journal/phf occurrence of extreme values. A narrower, less skewed distribution than with the IL can thus be expected. The differences between u wm Ã of the SG and ESG can again be related to the differences in the statistics of u el , i.e., uðz 1 Þ and uðz 2 Þ, respectively. With the increasing height, both skewness and flatness of the stream-wise velocity tend to decrease. Thus, despite the filtering some of these characteristics remain in u wm Ã . Interestingly, the differences in the statistics of u wm Ã only mildly affect the near-wall resolved turbulence. Figure 2(c), for instance, indicates that the resolved near-wall shear stress hu 0 w 0 i is only weakly affected by the choice of wall shear stress model. The IL and the SG model show only slightly elevated resolved shear stresses when compared to the ESG model. Similarly, low effects are found for the velocity variances (not shown for the sake of brevity). The low impact of the wall shear stress model on hu 0 w 0 i might also explain the small differences in /. Brasseur and Wei 93 showed that the gradient overshoot largely depends on the ratio < ¼ hu 0 w 0 i=s xz . Increasing hu 0 w 0 i in order to decrease the overshoot problem thus seems to require other measures, e.g., the use of different SGS models. Yet, further investigations thereof go beyond the scope of this study.

VI. GRID SENSITIVITY
In the following, we explore the grid sensitivity of the presented CLBM solver. Based on the findings of Sec. V, we limit the discussion to the ESG model with the settings given in Sec. V. Four grid resolutions n z ¼ f64; 96; 128; 160g are considered in the comparison. In addition to MOST and other scaling laws, the results are compared to the available statistics of two LES references of the same canonical test case. Both are obtained from a staggered pseudo-spectral-finitedifference (PSFD) solver using a pseudo-spectral discretization in the horizontal directions and a second-order finite difference scheme in the vertical. PSFD solvers are the most well-established numerical approach for ABL simulations due to their low numerical dissipation and remain the go-to benchmark for other numerical approaches. 91 The first reference case by Gadde et al. 61 refers to a vertical resolution in between the two highest ones presented here, i.e., n z ¼ 144, and uses the same SGS model as in this study, the AMD model. The second reference case by Stevens et al. 12 refers to a significantly higher vertical resolution of n z ¼ 256 using the arguably more advanced Lagrangian-averaged scale-dependent (LASD) dynamic Smagorinsky SGS model. 77 The two cases will hereafter be referred to as PSFD AMD 144 and PSFD LASD 256 , respectively. The respective aspect ratios Dx=Dz (note that Dx ¼ Dy for all cases) of the two cases are 2p and p. Particularly in PSFD AMD 144 , the aspect ratio is thus notably higher than in the LBM cases where Dx=Dz is inherently one due to the cubic unit cell of the lattice. Still, this discrepancy is considered minor for the following comparison, as Dx=Dz ¼ 2p states the typical choice for boundary layer simulations using PSFD approaches. 91 Major sensitives to lower aspect ratios in this reference case are therefore not anticipated. Further details on the specific numerical setup of the reference cases are contrasted against the CLBM setup in Table II.

A. Mean velocities and shear stress
The mean stream-wise velocity, non-dimensional velocity gradient, and the resolved and modeled shear stresses are shown in Fig. 4 in comparison with the phenomenological profiles as well as PSFD AMD 144 . The considered grid resolutions are in reasonable agreement with the logarithmic velocity profile in the surface layer. The underprediction of the velocity at the first grid point persists regardless of the grid resolution. Figure 4(b) reveals that the overshoot in the velocity gradient even increases when increasing the grid resolution. Yet, for all resolutions the overshoot remains limited to the second grid point. Further aloft / quickly recovers to values closer to the theoretical surface layer value of 1 while closely following the curve of PSFD AMD 144 . In PSFD AMD 144 , a small overshoot can also be observed at the second grid point, yet of significantly lower magnitude.
With hu 0 w 0 i=u 2 Ã $ 0:35, the resolved shear stress at the first grid point lies well below the reference case for all grid resolutions. However, the ratio of resolved to modeled shear stress increases notably faster along the vertical in the CLBM simulations than in the PSFD AMD 144 case. From the second grid point onward, all CLBM cases exhibit more than 90% resolved shear stress. The latter can be attributed to the larger grid spacing in the horizontal directions in the reference case as well as the different choice of model coefficient C i .
Explanations and remedies for the overshoot problem have been discussed in various contexts, such as wall models and surface boundary conditions, 83,88 wall shear stress models (as discussed in Secs. III B 2 and V) as well as SGS models. 77,82 A more general explanation is due to Brasseur and Wei. 93 The authors showed that the spurious gradient overshoot in high Re rough-wall LES arises from similar physics as the naturally occurring peak in the gradient in the viscous sub-layer of smooth-wall boundary layer flows. There, the peak coincides with the transition region of viscous to turbulent stress dominated flow regions. In rough-wall LES, the near-wall eddy viscosity (or numerical dissipation) can evoke an artificial viscous layer with similar flow characteristics. Analogously, the spurious overshoot is typically found at z j hu 0 w 0 i ' s xz . Therefore, one necessary criterion for the reduction of the overshoot is hu 0 w 0 i ! s xz at z 1 as it pushes the transition region, and hence the overshoot, below the first grid point. Following this reasoning, the trends in the overshoot of the CLBM

ARTICLE
scitation.org/journal/phf results (overshoot at second grid point, increasing with increasing n z ) are in line with the observations of the corresponding shear stresses (transition to hu 0 w 0 i > s xz between first and second grid point). The same applies to the lower overshoot found in PSFD AMD 144 . Given the comparably large fraction of resolved turbulent shear stress from z 2 onward (as well as further results discussed later in this section), it does not appear that the CLBM approach is particularly overdissipative when compared to PSFD AMD 144 . It is therefore assumed that the reasons for the rather low hu 0 w 0 iðz 1 Þ lie elsewhere. For instance, in the CLBM the wall stress s w is directly encoded in the distribution functions coming from the wall. Both horizontal and vertical velocity components at the first off-wall node are thus directly affected by the modeled shear stress term of the wall model, as can be inferred from Eqs. (20) and (28). On the other hand, on the staggered grid in the PSFD solver, s w is solely prescribed on w-nodes that coincide with the exact location of the wall. The horizontal velocity components u and v at the first node in the bulk (z ¼ Dz=2) are thus only affected by s w via the stress divergence term. The main difference to the PSFD cases is again the resolved turbulence at the first grid node. Furthermore, n z > 64 exhibit somewhat larger hu 0 u 0 i at 0:05 z=H 0:2. However, in the lower part of the surface layer the agreement between the CLBM and PSFD LASD 256 is significantly better [see also Fig. 6(a)]. Furthermore, it is worth mentioning that simulations by Gadde et al. 61 with the same setup but using the LASD (not shown here for the sake of clarity) display a much closer agreement with PSFD LASD 256 in the outer layer. Hence, we can assume that the observed disagreements are largely due to the AMD model rather than the horizontal or vertical grid resolution or other numerical differences.

B. Variances
As for the lateral variance, all shown cases are in somewhat close agreement. Again, the largest differences between the CLBM and PSFD results can be found close to the surface.
The impact of the grid resolution on the vertical variance is similarly small as for the horizontal components. With the increasing grid resolution, only the peak of the vertical variance in the surface layer increases slightly. For all CLBM cases, this peak is more pronounced than in PSFD LASD 256 . On the other hand, the magnitude of hw 0 w 0 i in the

Physics of Fluids
ARTICLE scitation.org/journal/phf outer layer agrees well with PSFD LASD 256 . Interestingly, PSFD AMD 144 consistently underpredicts hw 0 w 0 i as compared to PSFD LASD 256 . Moreover, the CLBM cases and PSFD AMD 144 show a slightly concave profile of hw 0 w 0 i in the lower outer layer. In contrast, PSFD LASD 256 and other cases of even higher resolution discussed in Ref. 12 show an almost linear decrease in hw 0 w 0 i throughout the outer layer.
In 1970s and 1980s, Townsend 94 and others derived models for the Reynolds-stress tensor based on the attached-eddy hypothesis. These similarity laws predict a logarithmic behavior for the streamwise velocity fluctuations in the surface layer, with where d is an outer length scale and A 1 ¼ 1:25 is the Townsend-Perry constant. Within the last two decades, highly resolved experimental data as well as LES finally confirmed parts of these formulations, i.e., A 1 % 1:25 for 0:04 Շ z=H Շ 0:23; see, e.g., Refs. 12 and 95. On the other hand, the constant B 1 appears to be case dependent, not following a particular self-similarity. 12 As to further scrutinize the variances, we compare hu 0 u 0 i of the discussed cases against Eq. (31) in Fig. 6. For the sake of completeness, we present additional smooth-wall experimental data (digitized by the authors and hereafter referred to as WT 19k ) from Ref. 95  To gain further insights into the characteristics of the stream-wise variance, Fig. 6(b) shows the local slope A 1 of the different cases. The A 1 profiles confirm that neither the CLBM cases nor PSFD AMD 144 succeed to develop a clear logarithmic profile with the appropriate slope. Still, with the increasing grid resolution the 1.25-crossing of the CLBM cases shifts toward the wall. This observation generally agrees with the work by Stevens et al. 12 Interestingly, already the n z ¼ 96 case performs slightly better overall in terms of A 1 than PSFD AMD 144 . One reason for this might be the higher horizontal resolution in the LBM case.

C. Higher-order moments
The aforementioned discussion by Bou-Zeid 7 emphasizes the importance of higher-order statistics for the evaluation of LES quality. In this respect, Giacomini and Giometto, 91 for example, showed significant discrepancies between second-order finite-volume and PSFD approaches in the third-and fourth-order moments related to the overly dissipative nature of the former.
In Fig. 7, we compare the skewness S and flatness F of the three velocity components. We generally find a positive stream-wise skewness in the surface vicinity indicating an increased likelihood of negative velocity fluctuations. Throughout the surface layer, Sðu 0 Þ decreases almost linearly in logðz=HÞ toward consistently negative values in the outer layer. Following Marusic and Meneveau, 96 this characteristic change in the sign of Sðu 0 Þ appears to be due to the presence of large-scale motions (LSMs) in the surface layer, i.e., elongated streaks of negative velocity fluctuations. These superstructures are typically associated with an amplitude modulation that attenuates small-scale fluctuations close to the wall while amplifying them further aloft. The zero-crossing height of Sðu 0 Þ in the CLBM cases and PSFD AMD 144 is slightly elevated when compared to PSFD LASD 256 . Furthermore, with increasing grid resolution the CLBM results show a slight tendency for lower near-surface skewness than the PSFD cases. In the transition from the surface to the outer layer (0:08 Շ z=H Շ 0:3), the skewness in PSFD LASD 256 remains rather constant with Sðu 0 Þ % À0:1. On the other hand, in the CLBM cases and PSFD AMD 144 it more continuously decreases. The former though tends to agree more with experimental observations as shown, for instance, in Refs. 96 and 97. Still, the overall degree of agreement in the aforementioned characteristics can arguably be appreciated, particularly for the lower resolution CLBM cases. As for the flatness of the stream-wise velocity, all cases consistently predict a sub-Gaussian distribution for z=H > 0:01. Toward the wall, Fðu 0 Þ increases reaching Gaussian to super-Gaussian values depending on the grid resolution. Large flatnesses toward the wall are again associated with the increased intermittency originating from the amplitude modulation of large-scale stream-wise streaks. 98  The vertical velocity skewness is consistently found to increase throughout the boundary layer. Solely the first grid point in the CLBM cases deviates from this trend, which is believed to be an artifact of the wall model. The zero-crossing of Sðw 0 Þ above z 1 though shifts toward the wall with increasing grid resolution. A trend that has similarly been reported in other LES studies. 11,12 Moreover, field measurements of near-neutral ABLs confirm the depicted outer-layer behavior of an increasing vertical velocity skewness with Sðw 0 Þ % 0:3 at z=H % 0:3. 99 The same study also reports flatness factors in the range of 3-4 in the outer layer with a tendency to increase with height. Here again, only the first grid point in the CLBM cases stands out with somewhat higher flatness than in the bulk. As to that, it is also worth mentioning that the different wall shear stress models discussed in Sec. V are found to have little to no impact on the skewness and flatness of w 0 at z 1 . It is, therefore, believed that this matter similarly relates the numerics of the wall model as discussed in Sec. VI A for hu 0 w 0 iðz 1 Þ.

D. One-dimensional velocity spectra and auto-correlations
In Fig. 8, we compare the stream-wise one-dimensional spectra of the stream-wise and vertical velocity at various heights, E uu and E ww , respectively. The spectra are obtained from the Fouriertransform of the respective velocity component and then averaged in space and time. E uu and E ww are normalized by u 2 Ã z and plotted against the stream-wise wavenumber j x multiplied by z. With the aforementioned normalization, we expect a collapse of E uu in the inertial subrange (j x z > 1) following the phenomenological j À5=3 -scaling as well as j À1 -scaling in the production range (j x z < 1). 82 With n z ¼ 64, only small regions in the stream-wise spectra of the CLBM adhere to the Kolmogorov scaling. As a consequence, the overlap in the inertial subrange is rather limited. Generally, the decay of larger scales is too small as indicated by a lower slope than the expected À5=3. At higher wavenumbers, the spectra peel off at significantly larger slopes due to the non-negligible numerical dissipation of the scheme. Naturally, the latter is not observed in PSFD AMD 144 [see Fig. 8(e)] thanks to the spectral accuracy of the solver. In the near-wall region, the low-resolution CLBM case also underpredict the production-range scaling. The spectra of the vertical velocity [ Fig. 8(f)] collapse marginally better in the inertial subrange since the peel-off at high wavenumbers is less pronounced. At lower wavenumbers, E ww flattens out as expected from both theory 94 and experimental observations. 100 The overall energy at the lowest grid point appears significantly elevated when compared to the higher levels. Here, we would typically expect a more continuous increase in the resolved turbulent kinetic energies and even a collapse of the spectra a low wavenumbers. The latter can partially be observed in PSFD AMD 144 [see Fig. 8(h)] as well as other experimental 100 and numerical data. 12,101 With the increasing grid resolution, the collapse of E uu in the inertial subrange improves notably as the slopes become steeper at most heights. A steepening can also be observed at scales larger than the wall distance (j x z < 1). The match with the production range scaling thus also improves. With n z ! 128 the Kolmogorov scaling is indeed only marginally worse than in PSFD AMD 144 when disregarding the high-wavenumber peel-off. As for E ww , it can be appreciated that the energy difference between the first and second nodes decreases at higher resolutions. Apart from that, a steeping and improved overlap is particularly seen in the larger scales of the inertial subrange.

ARTICLE
scitation.org/journal/phf Further insights into the spatial coherence of the flow can be obtained from the one-dimensional spatial auto-correlations as well as the instantaneous snapshots of the stream-wise velocity fluctuation depicted in Figs. 9 and 10, respectively. In the stream-wise direction, R uu decays at a remarkably similar rate in all presented cases. A slightly faster decay of the shorter lengthscales can be observed for n z ¼ 64.
On the other hand, the correlation at larger length scales becomes smaller with increasing grid resolution. In all cases, u 0 remains correlated over the available stream-wise range r x . This indicates that the present domain is not large enough to accommodate the largest turbulent structures, such as the aforementioned LSMs or even larger VLSMs (very-large scale motions) reaching lengths of up to 20H. [102][103][104] In shorter domains, these streaks lack the sufficient longitudinal space to breakup. As a consequence, they get recycled at the inlet and become infinitely long. Still, various authors stress that the issue is typically not found to affect the convergence of horizontally and time averaged statistics. 105,106 In Fig. 10, the streaks can be recognized as elongated structures of negative velocity fluctuations flanked by associated high-speed regions. The cross-wise correlation shows the expected alternation of negative to positive values due to the presence of the aforementioned streaks. Unfortunately, Fig. 9(b) reveals that a direct comparison of R uu ðr y Þ to PSFD AMD 144 is not straightforward due to the different lateral dimensions of the domains. It seems that the larger span-wise extent of the domain leads to a more pronounced meandering of the stream-wise streaks resulting in wider and more pronounced curves of R uu in PSFD AMD 144 . The latter is qualitatively reproduced in Fig. 10. We, therefore, provide additional results from Giacomini and Giacomo 91 using the same PSFD solver with n z ¼ 128,

Physics of Fluids
ARTICLE scitation.org/journal/phf a standard Smagorinsky model with C s ¼ 0:1, and a more similar lateral domain extent L y of 4 3 p H. The results will be referred to as PSFD Smag 128 . As shown in Fig. 9(b), the agreement with PSFD Smag 128 is indeed notably better corroborating the aforesaid conjecture. Further investigations of the effect of LSMs and VLSMs do, however, go beyond the scope of this work. The interested reader is referred to Refs. 42, 102, and 105. Finally, we shall look upon the sole presence of the streaks from a numerical point of view. In that regard, it can be appreciated that the CLBM is able to sustain these structures at all investigated grid resolutions. Note, for instance, that a selection of second-order finite-volume Navier-Stokes approaches presented in Ref. 91 failed to do so due to excessive numerical dissipation. The latter showed in a significantly faster longitudinal decay of the stream-wise correlation as well as a reduced lateral extent of non-zero cross-wise correlation.

VII. CONCLUSION
In the presented study, we employed a cumulant lattice Boltzmann approach for LES of neutral ABL flows. Wall-resolved LES or direct numerical simulation (DNS) of, for instance, smooth-wall channel flows at moderate Reynolds number have been excessively discussed over the past two decades, 107-109 followed by wall-modeled LES in more recent years. 37,40 To our best knowledge, this states one of the first comprehensive LBM studies of wall-modeled LES of roughwall boundary layer flows at very large ("infinite") 12,82 Reynolds number.
Initially, we reviewed existing wall modeling approaches for the LBM. Based thereupon, we presented a novel method that can be readily coupled to common bounce-back schemes. The novel approach, herein referred to as iMEM (inverse Momentum Exchange Method), enforces a modeled wall shear stress at the first off-wall grid point via the wall velocity. To this end, the wall velocity is adjusted such that the total wall-parallel force exerted by the wall on the first off-wall grid point refers to the modeled wall shear stress multiplied by the wallnormal area. Thereby, the approach avoids any dependency on the eddy viscosity as in velocity-gradient-based slip velocity approaches.
At the same time, the velocity at the first off-wall nodes is not fully prescribed as in wall modeling approaches based on the WNBCs. In that sense, the iMEM remains close to classical equilibrium wall models used in Navier-Stokes approaches.
The wall-modeled LBM LES approach was applied to the canonical test-case of an isothermal pressure-driven rough-wall boundary layer capped by a solid lid. An initial comparison of different models for the wall shear stress revealed only little effect of the filtering applied to the input velocity. On the other hand, the wall-normal distance of the exchange location was shown to largely affect the overall magnitude of the velocity profile. It is, therefore, found to be an effective measure to reduce the occurring LLM, in line with the original work by Kawai and Larsson 63 and others. 64,87 Still, it should be stressed that the velocity-gradient overshoot at the second grid point was almost unaffected by the measure. A reduction thereof thus remains desirable as it appears to be the primary cause for the LLM. Remedial measures might be found in the use of different SGS models or additional nearwall corrections to the utilized AMD model. Whitmore et al. 88 for instance, showed that the near-wall eddy viscosity of the AMD model still largely surpassed that of dynamic Smagorinsky models despite the otherwise good performance of the former. Near-wall velocity fluctuations are thus exceedingly damped. Then again, low near-wall turbulence intensities and particularly resolved turbulent shear stresses were found to be a main cause for the gradient overshoot. 90,93 Other measures to increase the near-wall fluctuations can be the use of stochastic backscatter models 110 or wall models allowing for transpiration (wallnormal velocities). 111,112 Particularly, the latter appears as a suitable candidate to be incorporated in the iMEM in its presented form.
A more detailed analysis of the flow statistics was presented for the ESG wall shear stress model with different grid resolutions. The CLBM cases were contrasted against several solutions of wellestablished PSFD solvers, phenomenological scaling laws, as well as experimental results. The overall agreement with the considered references renders the (parametrized) CLBM as a promising method for LES of ABL flows. This particularly refers the presented higher-order moments and velocity spectra. We generally associate the good

Physics of Fluids
ARTICLE scitation.org/journal/phf agreement in the presented statistics with the low numerical diffusivity of the parametrized CLBM. Note, for instance, that the second-order accurate regularized CLBM (not shown for the sake of brevity) captured all turbulence statistics significantly worse, including a more pronounce gradient-overshoot. This observation is again well in line with the aforementioned study by Giacomini and Giacomo 91 on the suitability of second-order finite-volume Navier-Stokes approaches for ABL simulations. In that respect, future studies should attend to similar investigations of other collision models. Mind that the parametrized CLBM still states the only LBM approach with fourth-order spatial accuracy of the viscous term on standard mono-speed lattices. 57 Furthermore, a comprehensive comparison of the iMEM to other LBM wall-models should be considered in future studies. Admittedly, the motivation for the introduction and use of the iMEM in this study was solely based on the theoretical arguments. After all, experiences from Navier-Stokes-based LES show that the suitability of near-wall treatments can be somewhat case dependent. In summary, this work illustrates the overall suitability of the presented wall-modeled LES CLBM approach for ABL simulations. Complementing previous studies, 22,34 this underlines the potential of the method, not only for fundamental boundary layer studies, but also for related engineering-driven applications as found in wind energy or other fields of wind engineering. Finally, this potential is best underlined in terms of the computational performance. Most cases presented in this work ran on a single GPU (Nvidia RTX 2080 Titan) with an average performance of about 900 MNUPS (Million Node Updated Per Second). For the moderate resolution case (n z ¼ 96) with about 20 Â 10 6 grid points, this refers to a wall time of about 900 s=T Ã (with T Ã ¼ 2500 s). Beyond real-time, LES of ABL flows at reasonable grid resolutions thus becomes affordable.

ACKNOWLEDGMENTS
We would like to thank Srinidhi N. Gadde for the fruitful discussions and for providing the PSFD reference data. Our thanks also go to Christoph Niedermeier and Jasper Behrensdorf for supporting us with the high-resolution runs.
The majority of simulations were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) at the National Supercomputing Center (NSC) under the Project No. SNIC 2020/1-10. Their support is gratefully acknowledged.

DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.