A bibliometric analysis of battery research with the BATTERY 2030+ roadmap as point of departure

In this bibliometric study, we analyze the six battery research subfields identified in the BATTERY 2030+ roadmap: Battery Interface Genome, Materials Acceleration Platform, Recyclability, Smart functionalities: Self-healing, Smart functionalities: Sensing, and Manufacturability. In addition, we analyze the entire research field related to BATTERY 2030+ as a whole, using two operationalizations. We (a) evaluate the European standing in the subfields/the BATTERY 2030+ field in comparison to the rest of the world, and (b) identify strongholds of the subfields/the BATTERY 2030+ field across Europe. For each subfield and the field as a whole, we used seed articles, i.e. articles listed in the BATTERY 2030+ roadmap or cited by such articles, in order to generate additional, similar articles located in an algorithmically obtained classification system. The output of the analysis is publication volumes, field normalized citation impact values with comparisons between country/country aggregates and between organizations, co-publishing networks between countries and organizations, and keyword co-occurrence networks. For the results related to (a), the performance of EU & associated (countries) is similar to China and the aggregate Japan-South Korea-Singapore and well below North America regarding citation impact and with respect to the field as a whole. Exceptions are, however, the subfields Battery Interface Genome and Recyclability. For the results related to (b), there is a large variability in the EU & associated organizations regarding volume in the different subfields. For citation impact, examples of highperforming EU & associated organizations are ETH Zurich and Max Planck Society for the Advancement of Science.


Introduction
Uppsala University is coordinating an EU-funded Horizon 2020 large scale research initiative, BATTERY 2030+, which started September 1 st 2020. 1 The project is a continuation of a previous project that recently published a battery research roadmap 2 . One of the aims of the BATTERY 2030+ initiative is to monitor the progress towards the goals set out in the battery research roadmap, as well as emerging areas, opportunities and challenges. The monitoring will include two bibliometric analyses of European and international battery research subfields: the analysis described in this report and a second analysis executed at the end of the project.
In this report, we treat the six battery research subfields identified in the BATTERY 2030+ roadmap. These fields are Battery Interface Genome (BIG), Materials Acceleration Platform (MAP), Recyclability, Smart functionalities: Self-healing, Smart functionalities: Sensing, and Manufacturability. In addition, we analyze the BATTERY 2030+ field as a whole. The overarching aims of the analysis are: (a) to evaluate the European standing in the subfields/the BATTERY 2030+ field in comparison to the rest of the world, (b) to identify strongholds of the subfields/the BATTERY 2030+ field across Europe.
The output of the analysis is indicated in the following list:  Publication volumes.  Field normalized citation impact values with comparisons between country/country aggregates and between organizations.  Co-publishing networks, both between countries and organizations.  Keyword co-occurrence networks.
The country/country aggregates referred to above and used in the report are defined in Table  1. The remainder of this report is structured as follows. Section 2 treats data and methods, whereas Section 3 reports the results of the analysis. In Section 4, we reflect on the results, put forward limitations and give conclusions.

Data and methods
In this section, the main data source of the analysis is described, as well as the methods used.

Data source
The data source of the analysis is the KTH Library database Bibmet, a relational database that constitutes a bibliometric version of Web of Science (WoS). Bibmet contains about 64 million publications, with the earliest publication year equal to 1980, and is updated quarterly. The publication period of the analysis is 2010-2019, and the WoS document types taken into account are "Article" and "Review". In the remainder of this work, we use the term "article" to stand for articles and reviews.
Bibmet involves a classification system, algorithmically obtained by use of a methodology proposed by Waltman and van Eck (2012). The system is hierarchical and has four levels of clusters, where, for each level, the clusters are pairwise disjoint. Only articles are clustered, based on direct citation relations between them, and the clustering technique used is similar to modularity-based clustering (Newman 2004a(Newman , 2004b. 35.7 million articles are included in the system. Each cluster, regardless of hierarchical level, has been algorithmically assigned three labels, where a label is an author keyword, a journal name, a WoS subject category name or a word derived from author addresses. The purpose of these labels is to indicate the subject orientation of the clusters.

Article sets for the six subfields and the BATTERY 2030+ field
For each of the six subfields, and the BATTERY 2030+ field as a whole, we used the classification system to define a set of articles to analyze. BATTERY 2030+ roadmap includes, for each subfield, a publication list. These lists were used as starting points in the process of defining article sets for the subfields. Let S be a subfield. The following five steps were carried out to define a set of articles for S: 1. From the publication list for S in the BATTERY 2030+ roadmap, the subset of articles covered by Bibmet was selected. Let Sr be this subset. If deemed desirable, Sr was expanded with additional articles selected by the BATTERY 2030+ consortium. 2. For each article x in Sr, each article cited by x and covered by Bibmet was added to Sr. Let Sa be the resulting set. The articles in Sa were considered as seed articles: articles that can be used in order to obtain additional, similar publications. 3. The articles in Sa were located in the classification system with respect to the most fine-grained level of the system, level-1 (with 158,783 clusters) and the next to most fine-grained level, level-2 (with 5,053 clusters). For both levels, Excel sheets were created, in which the identified clusters were ordered descending after the number of articles in Sa, i.e. the number of seed articles for S, that a cluster contains. Besides information on number of seed articles were, for instance, cluster labels included in the sheets. Moreover, sheets with bibliographic information on the articles belonging to the identified clusters were created. 4. For the clusters with the highest frequencies of articles from Sa, keyword cooccurrence networks and co-publishing networks of countries and organizations were created. The networks were visualized, and the visualizations stored in image files. 5. At least one subject expert, with regard to the subfield S, analyzed the sheets from step 3 and the image files from step 4. The subject expert(s) marked the clusters that in her/his view are relevant, i.e. should be included in the analysis, and provided this and other feedback to the authors of this report. 6. The union of the clusters that were marked as relevant by the subject expert(s), say US, was obtained, and US constitutes the set of articles assumed to represent the subfield S in the analysis. 4 Thus, the execution of steps 1-5 for each subfield yielded six article sets, where each such set is our operationalization of the corresponding subfield.
For the part of the analysis that treats the BATTERY 2030+ field as a whole, we took two operationalization approaches. In the first approach, the union of the six article sets (the US sets) was used as an operationalization of the field. Let POOL denote this set. However, since POOL may represent the BATTERY 2030+ field quite narrowly, we used a larger set of articles (compared to POOL) in the second approach. This set, say WIDE, is based on a wider selection of larger level-2 clusters, which cannot necessarily be directly tied to the specific subfields of BATTERY 2030+, but which are relevant to the broader battery field as defined from the articles in the six sets of seed articles. Further, the selected level-2 clusters are ranked high, with respect to the number of seed articles they contain, for at least one of the six subfields. More precisely, for each included level-2 cluster C, (1) there are at least two subfields S and S' such that C belongs to the five highest ranked clusters in both S and S' with respect to number of seed articles, or (2) C has been selected by subject experts for at least one subfield. Table A1 in Appendix 1 lists the clusters used for the definition of WIDE, also indicating if these clusters belong to the 10 highest ranked clusters in each respective subfield, with respect to number of seed articles. In Figure 1, a conceptual view of the two approaches is given. Note that not all articles in POOL are included in WIDE. Figure 1. Conceptual view of the two approaches to the operationalization of the whole BATTERY 2030+ field, based on the cluster selection method. The red circles represent article sets based on selected clusters for the specific subfields and the blue circle represents an article set based on a wider selection of larger level-2 clusters.

Indicators
For selected countries/country aggregates and organizations, and for each subfield and the BATTERY 2030+ field, the indicators put forward in Table 2 are used to describe performance. Regarding the four citation-based indicators, cf and Ptop10% are publicationlevel indicators, whereas jcf and Jtop25% are journal-level indicators. Jcf is a field normalized counterpart to the well-known Journal Impact Factor.
The four citation-based indicators are calculated by the use of fractional counting. An author's fraction of an article is counted as 1/n, where n is the number of authors of the article. A unit's (e.g. an organization's) fraction of the article is then given by the sum of the author fractions of the authors affiliated to the unit in the article. However, if an author is affiliated to more than one unit in the article, the fraction of the author is distributed uniformly across these units. Fractional counting yields a more proper field normalization of citation impact indicators compared to full counting.

P frac
Fractional counts of articles.
cf Mean field normalized citation rate. This indicator normalizes for the variation of citation patterns between subject fields. Each article is compared to a reference group of articles. In our case, for an article a in the set US (the set of articles assumed to represent the subfield S in the analysis), the reference group consists of all articles in US published the same year as a. The number of citations of a is divided by the average number of citations across the articles belonging to US and published the same year as a, which results in a field normalized citation rate for a. For a given country/country aggregate/organization represented in US and a given publication year, the cf value expresses the average field normalized citation rate of the country's/country aggregate's/organization's articles in US that are published in the year. The weighted average of the cf values of all countries/country aggregates/organizations for a given year, where the weight of a country/country aggregate/organization is given by its fractionalized number of articles, is equal to 1. Therefore, a citation rate above 1 for a country/country aggregate/organization indicates that its set of articles is cited above world average, e.g. a citation rate of 1.2 indicates that its articles are cited 20 percent above world average.

Ptop10%
(expressed as share) The share of articles among the 10 percent most cited. The same reference group as for the field normalized citation rate is used for the indicator. Articles can partly belong to the 10 percent most cited articles if several articles have the same citation value as the percentile limit. The weighted average of the Ptop10% values of all countries/country aggregates/organizations for a given year, where the weight of a country/country aggregate/organization is given by its fractionalized number of articles, is equal to 10.
jcf Mean field normalized citation rate for journals. This indicator shows the citation impact of the journals in which the unit has published. It is calculated as an average of the field normalized citation rate of the set of journals in which the analyzed unit has published. If the unit has published multiple articles in the same journal, the journal's field normalized citation rate is counted multiple times. This journal indicator is normalized for field differences by the same principles as the mean field normalized citation rate (cf). However, in this case the Web of Science Subject categories for journals are used as a basis for obtaining reference groups. For an article b in a given journal J, the reference group consists of all articles appearing in the journals belonging to the same Web of Science Subject category (or categories) as J and published the same year as b. For an article a in the set US and published in the year y, the value of the journal of a is based on the years y-5 to y-1. The weighted average of the jcf values of all countries/country aggregates/organizations for a given year, where the weight of a country/country aggregate/organization is given by its fractionalized number of articles, is equal to 1.

Jtop25% (expressed as share)
The share of articles that have been published in journals, which are among the 25 percent most cited. The same reference group as for the mean field normalized citation rate for journals (jcf) is used for the indicator. The journals in the top 25 category publish 25 percent of the articles in the reference group. A journal can partly belong to the top 25 percent if it stretches over the percentile limit or if it has been classified into multiple fields with different percentile limits. The weighted average of the Jtop25% values of all countries/country aggregates/organizations for a given year, where the weight of a country/country aggregate/organization is given by its fractionalized number of articles, is equal to 25.

IntColl%
This indicator shows the number of articles that has been co-published between two or more countries. The default presentation of this indicator is by full counts.
In Table 2, regarding the field normalized citation indicators cf and Ptop10%, we only describe the reference group of articles for an article in a given US, corresponding to the subfield S. For POOL as an operationalization of the BATTERY 2030+ field and an article a in POOL, a belongs to exactly one US. The reference group of articles for a, with respect to the two indicators, is US. For WIDE as an operationalization, and an article a in WIDE, the reference group of articles for a, with respect to the two indicators, is WIDE, regardless of if a belongs to a US or not. Note that the calculation of the two journal-level field normalized citation indicators, jcf and Jtop25%, are not affected by whether subfields or the BATTERY 2030+ field are analyzed.
Notice that for the citation part of the study, the last considered publication year is 2018. The rationale for this is to avoid an improperly short citation window for the last publication year of the study (i.e. 2019). Citations are counted with an open window until the time for the analysis (last quarterly update of the database Bibmet), hence all citations from articles registered in the database at this point in time will been counted. For all citation statistics, author self-citations are excluded, defined as citations where any of the author names are the same in the citing and cited article.
For detailed documentation of the calculation of the two publication-level field normalized citation indicators and the two corresponding journal-level indicators, see Ahlgren et al. (2021) and the openly available document "Formal definitions of field normalized citation indicators and their implementation at KTH Royal Institute of Technology" 5 , respectively.

Results
In this section, we present the results of the analysis. Each of the sections 3.1-3.8, which correspond to the six subfields (sections 3.1-3.6) and the BATTERY 2030+ field (sections 3.7 and 3.8), has three subsections. The first subsection treats the country/country aggregate level. A table with indicator values by country/country aggregate is put forward, as well as line graphs for publication volume (P full) and citation impact (cf and Ptop10%). In these graphs, the horizontal axis corresponds to publication year. For all cf and Ptop10% graphs, a dashed, grey line indicates world average. The second subsection concerns the organization level and contains a table that corresponds to the table in the first subsection. 13 organizations are taken into account in the table: the top 10 organizations among EU & associated with respect to publication volume (the indicator P full), and the top 1 organization from China, North America and JKS regarding the same indicator. The subsection also gives information on the frequency of occurrence of companies in the articles of the subfield/field. Note that identifying organizational types in bibliometric studies can be difficult. This is especially the case for companies. Therefore, highlighted companies constitute samples, which do not give the complete picture.
In the third subsection, three bibliometric networks are visualized. First, a co-occurrence network with regard to author keywords is visualized, where the visualization was done using VOSviewer, a publically available program from CWTS, Leiden University (van Eck & Waltman, 2010). Unification of keywords was done by VOSviewer based on manually created thesaurus files: files in which keyword variants are mapped to a standard variant. In the network, the nodes represent keywords, and the larger a node is the higher is the weight of the node, where weight in this case is defined as the number of articles in which the keyword occurs. A link between two nodes indicates that the corresponding two keywords co-occur in at least one article. Moreover, the thicker the link is the higher its strength, where strength in this case is defined as the number of articles in which the two keywords co-occur. The distance between the nodes approximately indicates the strength of the co-occurrence relation between the corresponding keywords. However, a normalized link strength, association strength, is used as default by the VOSviewer layout technique: the link strength divided by the product of the two node weights. Note that VOSviewer cluster the keywords. VOSviewer uses modularity-based clustering (Newman 2004a(Newman , 2004b, where in our case the underlying relatedness measure between two keywords is association strength. All nodes in a given cluster have the same color, whereas nodes in different clusters have different colors. The third subsection further contain visualizations of co-publishing networks for both the country level and the organization level. Here, the nodes represent countries (organizations), and a link between two nodes indicates that there is at least one article in which the corresponding two country names (organization names) co-occur. In this case, the weight of the node is defined as the number of articles in which the country name (organization name) occurs, whereas the link strength in this case is defined as the number of articles in which the two country names (organization names) co-occur. The nodes were clustered by VOSviewer with the same methodology as in the author keywords clustering. For layout, association strength was used, as in the keyword case.  In this section, we give the results for BIG. The section has three subsections. The first one, which concerns results for country/country aggregates, puts forward one table and three graphs. In the second subsection, in which we deal with results for the organization level, one table is given. The third subsection visualizes three bibliometric networks.

Country/country aggregates
In Table 4, indicator values by country/country aggregate and for the whole publication period are given.
North America has a fairly stable publication volume from 2013 onwards ( Figure 2). China has, though, caught up during later years. Note that the publication volumes for EU & associated and JKS are small for each year, especially in the first part of the study period. Regarding cf and Ptop10%, the values fluctuate considerably, more so when the number of publications is low, and it is difficult to see a clear pattern (Figures 3 and 4).     There is a large variability in performance among the organizations in EU & associated with regard to cf and Ptop10%. It should be kept in mind, however, that the publication volumes are very small for these organizations, and it is therefore difficult to draw any firm conclusions. Regarding companies publishing in BIG, the four largest ones with respect to publication volume are Guangzhou Tinci Materials Technology (30) Samsung (28), 3M (24) and GM (13). We note that for the last three companies, the publishing is associated with the US branches, whereas the Guangzhou Tinci Materials Technology publishing is coming from China.

Bibliometric networks
The network in Figure 5 gives an overview of the author keywords used in the articles selected for the BIG subfield. Most research in the field thus far has focused on Li-ion batteries, clearly represented by the largest node in the center. On the left-hand side of the figure, in green and yellow clusters, keywords mostly associated with chemical engineering of the positive Li-ion electrode interface to the electrolyte are discerned. Typical electrode materials (such as LiCoO2) and electrolyte components (especially the well-known fluorinated compounds) active at their interfaces are found here. On the bottom right-hand side in red, concepts primarily associated with the negative electrode interface to the electrolyte cluster. The "solid electrolyte interphase" is expectedly a major node here.
The networks in Figures 6 and 7 show the collaboration networks between countries and organizations within BIG, respectively. A relatively strong connection between North America and China can be observed, both in the country network and in the organization network, where most Chinese and North American organizations can be found to the left ( Figure 7). One can note that there is a strong connection between Japanese and South Korean organizations. For Figure 7, most European organizations are found in the upper part of the network (light blue and purple).

Materials Acceleration Platform (MAP)
In this section, we give the results for MAP. The section has three subsections. The first one, which concerns results for country/country aggregates, puts forward one table and three graphs. In the second subsection, in which we deal with results for the organization level, one table is given. The third subsection visualizes three bibliometric networks.

Country/country aggregates
In Table 6, indicator values by country/country aggregate and for the whole publication period are given. MAP is clearly a very strong subfield for North America: regardless of citation impact indicator, North America has by far the best performance among the four units. China and JKS perform poorly for cf and Ptop10%, and China is lagging compared to some other subfields analyzed in this report.      Among the 10 organizations from EU & associated and country origin, Germany and Switzerland dominate. There is a large variability in performance among these 10 organizations with regard to cf and Ptop10%. Technical University of Berlin has the highest values on the two indicators. Further, all 24 articles in which this organization has participated have been internationally co-authored (IntCollab% equal is to 100.0%). University of California, Berkeley has the highest number of articles (P full) and has also a strong performance regarding the citation impact indicators. Generally, EU & associated organizations have very high values on the two journal-level citation impact indicators, jcf and Jtop25%. For MAP, the company publication volumes are relatively low. A notable exception is Citrine Informatics with 14 publications. This company focuses on AI in relation to material development.

Bibliometric networks
The network in Figure 11 gives an overview of the author keywords used in the articles selected for the MAP subfield. It is quite evident from the figure that there is a strong focus on computer science in MAP, an article set that is composed of one level-2 cluster. Several keywords, like "machine learning" and "high-throughput experimentation", are connected to AI-related subjects. This in line with the outlined vision in the BATTERY 2030+ roadmap, a vision inspired by the route of pharma industry in drug discovery processes where state-ofthe-art computational schemes are coupled with combinatorial material screening methodologies. The clusters are strongly nested and likely reflect that MAP is currently undergoing a strong exploratory phase in which large number of ideas are combined and evaluated.
The networks in Figures 12 and 13 show the collaboration networks between countries and organizations within MAP, respectively. As is clear from Figure 12, US is dominating MAP. Relative to what one may expect, China has rather low publication volume. For Germany and Japan, the opposite is the case.

Recyclability
In this section, we give the results for Recyclability. The section has three subsections. The first one, which concerns results for country/country aggregates, puts forward one table and three graphs. In the second subsection, in which we deal with results for the organization level, one table is given. The third subsection visualizes three bibliometric networks.

Country/country aggregates
In Table 8, indicator values by country/country aggregate and for the whole publication period are given. North America is surprisingly weak regarding the publication-level citation impact indicators, cf and Ptop10%, compared to several other subfields. China has by far the best performance for these two indicators.
As for several other subfields, China has a remarkable increase in publication volume in later years ( Figure 14). By contrast, the volume values are quite stable for JKS. For both cf and Ptop10%, EU & associated has possibly negative trends (Figures 15 and 16).     EU & associated has relatively few articles per organization. Among these organizations, Karlsruhe Institute of Technology has the highest number of articles, 20. Note that Chinese Academy of Sciences is not the Chinese organization with the highest number of articles (which is usually the case). Instead, Tsinghua University has the highest number, 55. There is a large variability in performance among the 13 organizations with regard to cf and Ptop10%. However, the values are uncertain due to small publication volumes. For Recyclability, the company publication volumes are relatively low. The only exception company is Ford Motor Company with 9 publications.

Bibliometric networks
The network in Figure 17 gives an overview of the author keywords used in the articles selected for the Recyclability subfield. The network is clearly separated into two themes, one dealing with aspects of electric vehicles, the other with more chemistry-and process-oriented aspects of battery recycling. These two themes are bridged by the node lithium-ion batteries. The label of the largest green node, "recyclability", is not shown.
The networks in Figures 18 and 19 show the collaboration networks between countries and organizations within Recyclability, respectively. In Figure 18, a strong collaboration link is visible between China and US. The organization network within Recyclability ( Figure 19) is more disconnected compared to the corresponding networks for the other subfields. Therefore, we choose to show the full disconnected network, in which about 24% of the nodes are not connected to main network. Possible causes for the disconnectedness are a narrow cluster selection, and thereby a smaller article set, and that the cluster selection seems to represent two quite distinct themes in battery recycling (cf. the comments on the author keyword co-occurrence network). Figure 17. Author keyword co-occurrence network for Recyclability. Minimum node (author keyword) weight is set to 4. Figure 18. Country co-publishing network for Recyclability. Minimum node (country name) weight is set to 2. Figure 19. Organization co-publishing network for Recyclability. Minimum node (organization name) weight is set to 3. Note that it is the full disconnected network that is visualized.

Smart functionalities: Self-healing
In this section, we give the results for Self-healing. The section has three subsections. The first one, which concerns results for country/country aggregate, puts forward one table and three graphs. In the second subsection, in which we deal with results for the organization level, one table is given. The third subsection visualizes three bibliometric networks.
Note that the cluster selection for Self-healing is rather broad compared to the other five subfields. This is due to the areas deemed relevant by the subject experts. Therefore, this subfield comprises self-healing in a general sense, and not only self-healing directly to related battery research.

Country/country aggregates
In Table 10, indicator values by country/country aggregate and for the whole publication period are given. Regardless of citation impact indicator, EU & associated has the worst performance among the four units. For instance, the Ptop10% value is only 5.6%, which is almost 50% below world average. North America has the highest citation impact values, regardless of indicator.
For Self-healing, China has a remarkable increase in publication volume in later years ( Figure  20). However, this is a general trend for Chinese research (Cao et al., 2020). Noteworthy is that JKS has a very good Ptop10% performance, and a good cf performance, for the last considered publication year, almost as good as the performance of North America (Figures 21  and 22).    The poor EU & associated performance in cf and Ptop10% is indicated Table 11. However, the selected EU & associated organizations perform considerably better with respect to the two journal-level citation impact indicators, jcf and Jtop25%. This gap between publicationlevel and journal-level citation impact can be seen as problematic (publishing in good venues but not attracting much citations). For Self-healing, the companies with the highest publication volumes are Samsung (27) and General Motors Company (15).

Bibliometric networks
The network in Figure 23 gives an overview of the author keywords used in the articles selected for the Self-healing subfield. It is clear that this subfield is not primary dealing with battery research but is rather more oriented towards self-healing in soft materials research. However, there is an emerging bridge over to battery research (indicated by the blue cluster) in form of next-generation materials such as graphene and carbon nanotubes. Overall, the network also indicates some of the broader topical trends in the self-healing area, such as mechanical properties (often dealing with non-biomaterials), microcapsule delivery in pharmaceuticals and hydrogels.
The networks in Figures 24 and 25 show the collaboration networks between countries and organizations within Self-healing, respectively. Again, China and US are the most prominent nodes (Figure 24), with South Korea in collaborating neighborhood. Germany and France are the largest European countries. The Chinese presence is even more apparent in the organization network ( Figure 25). However, this network is relatively unstructured and it is difficult so see a clear pattern. Figure 23. Author keyword co-occurrence network for Self-healing. Minimum node (author keyword) weight is set to 11. Figure 24. Country co-publishing network for Self-healing. Minimum node (country name) weight is set to 3. Figure 25. Organization co-publishing network for Self-healing. Minimum node (organization name) weight is set to 12.

Smart functionalities: Sensing
In this section, we give the results for Sensing. The section has three subsections. The first one, which concerns results for country/country aggregates, puts forward one table and three graphs. In the second subsection, in which we deal with results for the organization level, one table is given. The third subsection visualizes three bibliometric networks.

Country/country aggregates
In Table 12, indicator values by country/country aggregate and for the whole publication period are given. It is clear from the table that JKS is lagging, both in volume and in citation impact (regardless of indicator). EU & associated performs worse than China and North America for the publication-level citation impact indicators cf and Ptop10%. EU & associated performs better with respect to the two journal-level citation impact indicators, jcf and Jtop25%, compared to cf and Ptop10%.
As in several of the analyzed subfields, China has a remarkable increase in publication volume over time ( Figure 26). There is a decrease in publication volume for North America in later years. This outcome is perhaps surprising. Note that China, EU & associated and North America have similar cf and Ptop10% performance for the last considered publication year, 2018 (Figures 27 and 28).      Sunwoda Electronic Co, which is a battery producer also for the vehicle industry, is the company with the highest publication volume (21). In general, many car companies publish in Sensing, for instance General Motors Company and Mitsubishi Corporation.

Bibliometric networks
The network in Figure 29 gives an overview of the author keywords used in the articles selected for the Sensing subfield. The left side of the network is dealing with applied battery performance-related aspects of sensing. The blue cluster is primarily associated with concepts related to battery charge state (i.e. state of charge, open circuit voltage). The yellow cluster relates to battery lifetime aspects (i.e. state of health), whereas the green cluster clearly represents battery safety-related topics (i.e. heat generation, fire behavior). On the other side, the purple cluster is more directly dealing with specific sensing technologies. The optical methods indicated should primarily be seen as examples (e.g. Fiber Bragg grating-based sensing).
The networks in Figures 30 and 31 show the collaboration networks between countries and organizations within Sensing, respectively. In terms of publication volume, Canada and United Kingdom are more prominent in relation to China and US in comparison to the other five subfields (Figure 30; the node for Canada is the relatively large, green node near the node for US). The network of Figure 31 is somewhat unstructured but dominated by Chinese organizations. Most of the European organizations seem to be located in the lower part of the map, close to several Canadian organizations. Figure 29. Author keyword co-occurrence network for Sensing. Minimum node (author keyword) weight is set to 6. Figure 30. Country co-publishing network for Sensing. Minimum node (country name) weight is set to 3. Figure 31. Organization co-publishing network for Sensing. Minimum node (organization name) weight is set to 6.

Manufacturability
In this section, we give the results for Manufacturability. The section has three subsections. The first one, which concerns results for country/country aggregates, puts forward one table and three graphs. In the second subsection, in which we deal with results for the organization level, one table is given. The third subsection visualizes three bibliometric networks.
We point out that problems in defining the set of articles for the subfield Manufacturability were faced. Indeed, given our seed methodology, there was an overlap in the potential clusters between the subfields BIG, Manufacturability, and MAP. We will return to this issue in the section "Discussion".

Country/country aggregates
In Table 14, indicator values by country/country aggregate and for the whole publication period are given. The volume (P full) for China is comparably low, whereas the cf and Ptop10% performance of North America is surprisingly poor. Compared to other subfields, JKS is doing well regarding the citation impact indicators.
As in several of the analyzed subfields, China has a remarkable increase in publication volume over time (Figure 32). For the last considered citation impact year, 2018, EU & associated has the lowest values on cf and Ptop10% (Figures 33 and 34).    Among the 10 organizations from EU & associated, five organizations are from Germany and three from France. Note that the publication volumes are quite low overall. The four companies with the highest publication volumes are General Motors Company (12), Samsung (11), Robert Bosch (10) and Ford Motor Company (7).

Bibliometric networks
The network in Figure 35 gives an overview of the author keywords used in the articles selected for the Manufacturability subfield. This network is quite hard to interpret and does not show a clear grouping of subjects. The left hand side of Figure 35 however includes a few aspects, such as microstructure and electrode thickness, which are important in the field of manufacturability. Also, a number of analysis-and simulation methods that can be used to improve production processes can be seen. We discuss further challenges in defining and identifying the Manufacturability subfield in the discussion.
The networks in Figures 36 and 37 show the collaboration networks between countries and organizations within Manufacturability, respectively. For the country network (Figure 36), as indicated above in this section, the publication volume of China is relatively low. Other than that, Germany stands out both compared to US and to other European countries. With regard to the organization network (Figure 37), European organizations are located to right of the map, whereas the Chinese and the US organizations are intermingled to the left. Many organizations from United Kingdom are located centrally in the map and can be seen as bridge between China-US and Europe.

The BATTERY 2030+ field-POOL
In this section, we give the results for the BATTERY 30+ field as a whole, operationalized as the article set POOL. The section has three subsections. The first one, which concerns results for country/country aggregates, puts forward one table and three graphs. In the second subsection, in which we deal with results for the organization level, one table are given. The third subsection visualizes three bibliometric networks.
Note that the article sets for the subfields are of different size, and therefore the pooled set will be most influenced by the subfields with larger sets (also see the section "Discussion"). Consequently, the pooled results are strongly influenced by the results in Self-healing, a set with 7,127 articles.

Country/country aggregates
In  China's publication volume development from 2012 onwards is quite remarkable (Figure 38). For a majority of the last seven publication years, EU & associated has a similar cf performance as JKS ( Figure 39). From 2013, China and North America consistently outperform EU & associated and JKS. This is generally also the case for Ptop10% ( Figure  40). For this indicator, and the last considered publication year, North America has, by far, the best performance. China has a considerable increase from 2010 to 2013 for both cf and Ptop10%. However, for both these indicators, and in contrast to P full, China's indicator values are fairly stable from year 2013.    Jtop25%) compared to the selected organizations. Note that the organization counts here are dependent of the counts within the constituent subfields. Therefore, the organizations found under POOL are mainly determined by the Self-healing subfield due to the relatively high publication volume of this subfield.

Organizations
The performance of the organizations in EU & associated is better for the journal-level citation impact indicators (jcf and Jtop25%) compared to the publication-level ones. This suggests that the organizations perform better with regard to publishing in highly cited journals compared to the extent to which their articles are cited. Nanyang Technological University, in JKS, has high indicator values for all four citation impact indicators. When it comes to companies in POOL, the most prominent ones, based on publication volume, are represented by research activities in US. The two companies with the highest publication volumes are Samsung (97) and General Motors Company (65). Among the top 15, only two EU companies, both in Germany, are represented: BMW Group and PSA Group.

Bibliometric networks
The network in Figure 41 gives an overview of the author keywords used in the articles selected for POOL. This network captures many of the aspects covered by the corresponding networks for the six subfields. Self-healing is still distinct (red cluster), and most aspects of recycling is in the lower part of the network. Sensing aspects are mostly to the right. In contrast to the corresponding network for Recyclability, electric vehicles is here placed more closely to sensing than to general recycling. The network suggests that BIG and MAP are less distinct and have connections to several other subfields. It should be kept in mind that Selfhealing might have a disproportionate weight in the network, since that subfield consists of a relatively large number of articles.
The networks in Figures 42 and 43 show the collaboration networks between countries and organizations within POOL, respectively. As in several subfields, China and US dominate the country network ( Figure 42). Regarding EU & associated, the countries with the largest publications volumes are Germany, France, Italy, Switzerland and Belgium. In the organization network, organizations in EU & associated are located mainly to the left but with connections to China, Japan and US ( Figure 43). US organizations are placed quite centrally, Indian and South Korean at the top, and Japanese towards the bottom.

The BATTERY 2030+ field-WIDE
In this section, we give the results for the BATTERY 2030+ field as a whole, operationalized as the article set WIDE. The section has three subsections. The first one, which concerns results for country/country aggregates, puts forward one table and three graphs. In the second subsection, in which we deal with results for the organization level, two tables are given. The third subsection visualizes three bibliometric networks.

Country/country aggregates
In     There are some clear differences in citation impact performance for some of the organizations represented both in Table 19 and in the corresponding POOL table (Table 17). Karlsruhe Institute of Technology and Uppsala University have negative changes in cf and Ptop10% compared to POOL whereas Max Planck Society has a positive change. This outcome, though, is related to differences in field normalization. Max Planck Society has a lot of its articles in the subfield MAP, which might have a higher citation density than some other subfields. Recall that the articles in POOL are normalized against their subfields (and not against POOL), while the articles in WIDE are normalized against WIDE itself. Germany has a strong foothold in the WIDE: five out of ten in EU & associated are German universities or research institutes.

Bibliometric networks
The network in Figure 47 gives an overview of the author keywords used in the articles selected for WIDE. In this wider, compared to POOL, operationalization of the BATTERY 2030+ field as a whole, we observe that the network is not very differentiated. The green cluster seems to be related the negative electrode and possibly to the next-generation electrodes. The blue cluster deals more with classical Li-ion batteries with a focus on the positive electrode. The red cluster seems to focus on the electrolyte but is less materialoriented compared to the blue and green clusters. In the red cluster also the link between electrolyte concepts and the topics in BIG and MAP, such as machine-learning and neural networks, is discerned. The yellow cluster mainly captures self-healing. Overall, the six subfields are less visible in this network compared to the corresponding network for POOL, which is expected. Instead, this network provides a broader general overview of battery research, especially related to lithium batteries.

Discussion
In this work, we have used bibliometric methods to analyze battery research with the BATTERY 2030+ roadmap as point of departure. We treated the six battery research subfields identified in the BATTERY 2030+ roadmap: Battery Interface Genome (BIG), Materials Acceleration Platform (MAP), Recyclability, Smart functionalities: Self-healing, Smart functionalities: Sensing, and Manufacturability. Moreover, we analyzed the BATTERY 2030+ field as a whole, where two operationalizations of the whole were used. In the following list, we repeat the overarching aims of the analysis: (a) to evaluate the European standing in the subfields/the BATTERY 2030+ field in comparison to the rest of the world, (b) to identify strongholds of the subfields/the BATTERY 2030+ field across Europe.
In the remainder of this section, we reflect on the results, put forward methodological limitations and give conclusions.

Reflections on the results
For point (a) above, EU & associated has similar but slightly lower publication volumes compared to North America for both POOL and WIDE and for most subfields. However, in BIG and especially MAP, the publication volume from North America is considerably larger. One exception where EU & associated has a higher publication volume is Recyclability. Also note that MAP is the only subfield in which both North America and EU & associated have a higher volume than China. The citation performance (cf and Ptop10%) of EU & associated is similar to China and JKS and well below North America with regard to POOL and WIDE. Subfield exceptions are Recyclability, where EU & associated performs (cf) above North America and 7% above world average, and BIG, where EU & associated has the highest citation performance (cf), 36% above world average. Focusing on the end of the study period, EU & associated has the strongest citation performance, in relative terms, in MAP, while Selfhealing is by far the weakest subfield.
For point (b) above, Karlsruhe Institute of Technology and Max Planck Society are EU & associated organizations with high publication volumes in both POOL and WIDE. In the different subfields, there is a large variability in the top EU & associated organizations regarding volume. For citation impact (cf and Ptop10%), the performance of the EU & associated organizations is quite variable with some performing well above world average and some with a more modest performance. Examples of high-performing (cf and Ptop10%) EU & associated organizations are ETH Zurich and Max Planck Society.
Regarding publication volume, China and JKS are strengthened in WIDE compared to POOL. However, China is weakened with regard to citation impact in WIDE compared to POOL. One possible interpretation of this difference is that WIDE is gathering a wider selection of publications, where some may have a more national focus or lower levels of international collaboration. This could have led to the outcome that China has a lower citation impact in WIDE relative to POOL.
The identification of subfields, based on seed articles followed by selection of relevant clusters, was more straightforward for some subfields and in some cases challenging. For instance, the subfields Recycling, Sensing and Self-healing had a rather strong cluster signal, and relevant clusters could be selected with relative ease. One the other hand, the subfields BIG and Manufacturability were more challenging. This was due to a number of reasons. First, the potential clusters identified for these subfields (along with MAP) showed a large extent of overlap, and it was not easy to assign clusters to subfields. Since we aimed to have non-overlapping cluster selections for subfields this meant that clusters were only selected for one subfield. Second, some subfields from the Battery 2030+ roadmap are easier to define from a conceptual view and other less so. For instance, aspects of sensing and self-healing are easier to pinpoint than the more process-oriented and conceptual subfields BIG and MAP. Therefore, the selection of clusters for some of the more forward-looking subfields (BIG, MAP) was more challenging. The selection for Manufacturability was especially difficult, for two reasons: 1) it shared several potential clusters with BIG that were later used in the analysis for BIG, and 2) it is a newly emerging scientific field without a fundamentally strong academic tradition. Historically, the subject has been closely tied to a few international companies, but not necessarily with results published in academic journals, which will make it more difficult to identify using our data and methodology.
Although the scope of BATTERY 2030+ is essentially chemistry neutral, lithium-based rechargeable battery chemistries are today associated with large publication volumes and as representative for the highest performing battery systems act as benchmark and take-off point for alternative chemistries. This up to present dominance of lithium-based chemistries is clearly reflected in the available literature in the field.
One thing to comment on is the overall scope of the study. The methodology used here, going from seed articles to potential clusters, followed by selection of clusters for each subfield (with the aim of targeting these specific subfields), probably leads to a relatively narrow interpretation of the battery field, primarily targeting the perceived scope of Battery 2030+. We also use two definitions of Battery 2030+ as a whole, POOL and WIDE, where the former creates the union of the subfields and WIDE selects a group of more general level-2 clusters that are connected to several of the subfields, but which are not tied specifically to any of them. However, to view the battery field as a whole, a much wider perspective could also have been utilized, where not only battery research but also related research and technologies from e.g. applied physics, chemistry and recycling technology could have been included. If using the same type of cluster methodology, such a study could have selected clusters at a higher level (level-3) or pooled a much larger set of level-2 clusters. Clearly, such a study would be more loosely tied to Battery 2030+, but might be relevant for an even wider overview of the relative strength of different geographical regions and research organizations.

Methodological limitations
Our approach is based on publication clustering, which in turn is based on direct citation relations. This has the advantage of providing a relatively objective basis for subject delineation, and it also does not require time consuming compilation and expert curation of publication sets that are deemed relevant for different subject areas. As such, the method is not sensitive to human biases on notions of subject field relations, literature from different parts of the world etc. However, a crucial step in using clusters to represent subject fields lies in the identification and selection of clusters. In this study, we have used seed articles from the Battery 2030+ roadmap for identification of potential clusters and expert-based screening of clusters.
Another thing to keep in mind is that each article in the modularity-based clustering is placed in exactly one cluster. This means that articles that fall in-between two subject areas will be placed in exactly one cluster. Among many potential ways to delineate subject areas, one will also dominate, based on the citation relations within the literature. As an example, of interest in this study, sensors in batteries can be approached both from a technical point of view (i.e. sensing technology, and ways to measure aspects of battery state) or from the approach of the battery states that need to be monitored and measured (i.e. state of charge, state of health etc.). In the article clusters, the second perspective dominates, mainly because of citation practices within the fields. However, this places a clear limitation on the selection of clusters, and it also means that studies of the technical perspective in the sensor example above will be more challenging to identify.
Another approach to obtaining article sets for subfields is to use search queries. However, an advantage of the approach followed in this work compared to the search query approach is that the former is not dependent on the identification of search terms standing for the same or nearly the same concept. This is because the articles in the classification system have been clustered based on direct citation relations between them, and not based on textual similarity. The article set for a given subfield may contain articles (pertinent to the subfield) that treat a certain topic but doing this by using partially different terminologies. With the search query approach, the used query may fail to retrieve some of these articles. On the other hand, a possible advantage with search queries is the ability for fine grained control over the selection, when this is needed, for a user with deep knowledge of the subject field. One further caveat worth mentioning is that the cluster selection for subfields has been relatively independent and can follow slightly different principles. For instance, the relevant literature can be seen in more broad terms or more narrowly, as only directly relevant to e.g. lithium-ion batteries. As an example, the cluster selection for the subfield Self-healing is relatively broad, with the intention of selecting technologies probably relevant to self-healing in batteries, but not limited to batteries. Therefore, this selection of articles is quite large. On the other hand, in the subfield Recyclability a number of smaller clusters is instead selected, which are directly related to the recycling of batteries in general, and specifically to lithiumion batteries. For recycling, a wider perspective could have been chosen, for instance including metal recycling from mining runoffs or circuit-board recycling, but this was not done here. However, these different perspectives in subfield scope must be kept in mind when interpreting results, and especially in the POOL set, where the subfields with a broader selection will dominate the set and therefore also the mean-based indicators used.

3 Conclusions
We put forward tentative conclusions and observations in the list below. These can in part be considered in the planned second bibliometric analysis, referred to in the section "Introduction".
 EU & associated are relatively well represented (as countries and organizations) in most subfields, but is often lagging North America in publication volume and citation impact. In POOL and WIDE, China is also showing stronger citation impact than EU & associated towards the end of the study period.  Looking at the specific subfields and focusing on the end of the study period, EU & associated has the strongest citation performance, in relative terms, in MAP, while Self-healing is by far the weakest subfield.  None of the themes in Battery 2030+ are established but rather emerging multidisciplinary scientific fields, which vary strongly in the degree to which they are connected to traditional subfields in battery research.  The themes are expected to become more nested with time. For instance, the topics of recyclability and self-healing are until today primarily applied in other areas of research (such as biomaterials) with only a few links over to batteries.  From our study it is also clear that topics in BIG and MAP, such as neural networks, are currently mostly applied to battery electrolytes, but aspects associated with the electrodes are expected to receive increasing future focus.  For Recyclability, the areas of traditional battery recycling and electric vehicles with life cycle analysis are clearly observed as two distinct networks. The intensive research efforts on both batteries and electric vehicles today will likely reduce this separation as a more holistic approach to recyclability is needed.  Although the concepts of importance for future manufacturability, such as microstructure and various simulation approaches, are found as small nodes, they are present in existing literature and expected to grow in importance, surely as a result of the growing efforts within Battery 2030+.  The clusters in the Sensing subfield are clearly divided into two parts. One related to battery performance characteristics, which are intended to be probed by sensors, and the other related to the technical aspects of sensor operation. Although optically based methods are primarily represented, there are a number of other sensing technologies gaining momentum (e.g. acoustic emission sensing) and expected to result in significantly higher future publication volumes.
Appendix 1. Clusters used for the definitions of WIDE, the subfields, and POOL