uu.seUppsala University Publications
Change search
Refine search result
23456 201 - 250 of 252
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 201.
    Oltner, Alexander Mac
    Gotland University, School of Game Design, Technology and Learning Processes.
    Att överföra en turordningsbaserad spelprototyp till realtid: ett projekt rörande Victorious Skies och dess utveckling2011Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    This project details the process of converting and transferring a turn-based paper prototype to a digital real-time format. The projects goals were to see how well the original feeling could be transferred to real-time and how the transition itself went. The project have been completed with the help of the programmer Mikael Gullberg. The practical part of the project was executed between the dates of 25/4 – 2/5. This project is a part of the larger project of Victorious Skies. During this project values have been converted and properties from the turn-based paper prototype have been formatted for use in real-time. This has been a very interesting and giving project that has challenged us and presented numerous choices about how we wanted to execute the conversions. The work was organized with the help of a priority list. I have been the one who have had sole responsibility regarding design choices while Mikael Gullberg have provided the necessary programming knowledge needed to convert the prototype to a digital format. The result of this project has been a real-time based digital prototype that is used as Victorious Skies first such prototype. Knowledge has been gathered regarding conversions of this kind by practical testing. The digital prototype is true to the original in such a way that a clear connection could be seen between the two. Though they differ in several key aspects due to the changes the real-time format brought with it.

    I have been able to arrive at several conclusions during this project. The most important conclusion I have taken is that there is no way that you can keep the exact original feeling as the real-time format just brings too many new factors into play. The formula used to transfer games from turn-based to real-time is not simple and require lots of thought. To be done right, attention must be given to the minute details which makes the process of converting both challenging and entertaining.

  • 202.
    Orfanidis, Charalampos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Robustness in low power wide area networks2018Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    During the past few years we have witnessed an emergence of Wide Area Networks in the Internet of Things area. There are several new technologies like LoRa, Wi-SUN, Sigfox, that offer long range communication and low power for low-bitrate applications. These new technologies enable new application scenarios, such as smart cities, smart agriculture, and many more. However, when these networks co-exist in the same frequency band, they may cause problems to each other since they are heterogeneous and independent. Therefore it is very likely to have frame collisions between the different networks.

    In this thesis we first explore how tolerant these networks are to Cross Technology Interference (CTI). CTI can be described as the interference from heterogeneous wireless technologies that share the same frequency band and is able to affect the robustness and reliability of the network. In particular, we select two of them, LoRa and Wi-SUN and carry out a series of experiments with real hardware using several configurations. In this way, we quantify the tolerance of cross technology interference of each network against the other as well as which configuration settings are important.

    The next thing we explored is how well channel sensing mechanisms can detect the other network technologies and how they can be improved. For exploring these aspects, we used the default Clear Channel Assessment (CCA) mechanism of Wi-SUN against LoRa interference and we evaluated how accurate it is. We also improved this mechanism in order to have higher accuracy detection against LoRa interference.

    Finally, we propose an architecture for WSNs which will enable flexible reconfiguration of the nodes. The idea is based on Software Defined Network (SDN) principles and could help on our case by reconfiguring a node in order to mitigate the cross-technology interference from other networks.

    List of papers
    1. Investigating interference between LoRa and IEEE 802.15.4g networks
    Open this publication in new window or tab >>Investigating interference between LoRa and IEEE 802.15.4g networks
    2017 (English)In: Proc. 13th International Conference on Wireless and Mobile Computing, Networking and Communications, IEEE, 2017, p. 441-448Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    IEEE, 2017
    National Category
    Communication Systems
    Identifiers
    urn:nbn:se:uu:diva-331851 (URN)10.1109/WiMOB.2017.8115772 (DOI)000419818000061 ()978-1-5386-3839-2 (ISBN)
    Conference
    WiMob 2017, October 9–11, Rome, Italy
    Available from: 2017-11-23 Created: 2017-10-18 Last updated: 2018-05-31Bibliographically approved
    2. Improving LoRa/IEEE 802.15.4g co-existence
    Open this publication in new window or tab >>Improving LoRa/IEEE 802.15.4g co-existence
    (English)Manuscript (preprint) (Other academic)
    National Category
    Communication Systems
    Identifiers
    urn:nbn:se:uu:diva-351504 (URN)
    Available from: 2018-05-28 Created: 2018-05-28 Last updated: 2018-05-31
    3. Using software-defined networking principles for wireless sensor networks
    Open this publication in new window or tab >>Using software-defined networking principles for wireless sensor networks
    2015 (English)In: Proc. 11th Swedish National Computer Networking Workshop, 2015Conference paper, Published paper (Refereed)
    National Category
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-254172 (URN)
    Conference
    SNCNW 2015, May 28–29, Karlstad, Sweden
    Projects
    ProFuN
    Funder
    Swedish Foundation for Strategic Research , RIT08-0065
    Available from: 2015-06-05 Created: 2015-06-05 Last updated: 2018-05-31Bibliographically approved
  • 203.
    Pan, Xiaoyue
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Jonsson, Bengt
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    A Modeling Framework for Reuse Distance-based Estimation of Cache Performance2015In: Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on, IEEE, 2015, p. 62-71Conference paper (Refereed)
    Abstract [en]

    We develop an analytical modeling framework for efficient prediction of cache miss ratios based on reuse distance distributions. The only input needed for our predictions is the reuse distance distribution of a program execution: previous work has shown that they can be obtained with very small overhead by sampling from native executions. This should be contrasted with previous approaches that base predictions on stack distance distributions, whose collection need significantly larger overhead or additional hardware support. The predictions are based on a uniform modeling framework which can be specialized for a variety of cache replacement policies, including Random, LRU, PLRU, and MRU (aka. bit-PLRU), and for arbitrary values of cache size and cache associativity. We evaluate our modeling framework with the SPEC CPU 2006 benchmark suite over a set of cache configurations with varying cache size, associativity and replacement policy. The introduced inaccuracies were generally below 1% for the model of the policy, and additionally around 2% when set-local reuse distances must be estimated from global reuse distance distributions. The inaccuracy introduced by sampling is significantly smaller.

  • 204.
    Pan, Xiaoyue
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Jonsson, Bengt
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Modeling cache coherence misses on multicores2014In: 2014 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), IEEE, 2014, p. 96-105Conference paper (Refereed)
    Abstract [en]

    While maintaining the coherency of private caches, invalidation-based cache coherence protocols introduce cache coherence misses. We address the problem of predicting the number of cache coherence misses in the private cache of a parallel application when running on a multicore system with an invalidation-based cache coherence protocol. We propose three new performance models (uniform, phased and symmetric) for estimating the number of coherence misses from information about inter-core data sharing patterns and the individual core's data reuse patterns. The inputs to the uniform and phased models are the write frequency and reuse distance distribution of shared data from different cores. This input can be obtained either from profiling the target application on a single core or by analyzing the data access pattern statically, and does not need a detailed simulation of the pattern of interleaving accesses to shared data. The output of the models is an estimated number of coherence misses of the target application. The output can be combined with the number of other kinds of misses to estimate the total number of misses in each core's private cache. This output can also be used to guide program optimization to improve cache performance. We evaluate our models with a set of benchmarks from the PARSEC benchmark suite on real hardware.

  • 205.
    Paçacı, Görkem
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media, Information Systems.
    Hamfelt, Andreas
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media, Information Systems.
    A Visual System for Compositional Relational Programming2013Conference paper (Refereed)
    Abstract [en]

    Combilog is a compositional relational programming language that allows writing relational logic programs by functionally composing relational predicates. Higraphs, a diagram formalism is consulted to simplify some of the textual complexity of compositional relational programming to achieve a visual system that can represent these declarative meta-programs, with the final intention to design an intuitive and visually assisted complete development practice. As a proof of concept, an implementation of a two-way parser/visualizer is presented.

  • 206.
    Perais, Arthur
    et al.
    IRISA INRIA, Rennes, France.
    Seznec, André
    IRISA INRIA, Rennes, France.
    Michaud, Pierre
    IRISA INRIA, Rennes, France.
    Sembrant, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Cost-effective speculative scheduling in high performance processors2015In: Proc. 42nd International Symposium on Computer Architecture, New York: ACM Press, 2015, p. 247-259Conference paper (Refereed)
    Abstract [en]

    To maximize performance, out-of-order execution processors sometimes issue instructions without having the guarantee that operands will be available in time; e.g. loads are typically assumed to hit in the L1 cache and dependent instructions are issued accordingly. This form of speculation - that we refer to as speculative scheduling - has been used for two decades in real processors, but has received little attention from the research community. In particular, as pipeline depth grows, and the distance between the Issue and the Execute stages increases, it becomes critical to issue instructions dependent on variable-latency instructions as soon as possible rather than wait for the actual cycle at which the result becomes available. Unfortunately, due to the uncertain nature of speculative scheduling, the scheduler may wrongly issue an instruction that will not have its source(s) available on the bypass network when it reaches the Execute stage. In that event, the instruction is canceled and replayed, potentially impairing performance and increasing energy consumption. In this work, we do not present a new replay mechanism. Rather, we focus on ways to reduce the number of replays that are agnostic of the replay scheme. First, we propose an easily implementable, low-cost solution to reduce the number of replays caused by L1 bank conflicts. Schedule shifting always assumes that, given a dual-load issue capacity, the second load issued in a given cycle will be delayed because of a bank conflict. Its dependents are thus always issued with the corresponding delay. Second, we also improve on existing L1 hit/miss prediction schemes by taking into account instruction criticality. That is, for some criterion of criticality and for loads whose hit/miss behavior is hard to predict, we show that it is more cost-effective to stall dependents if the load is not predicted critical.

  • 207.
    Persson, Måns
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Waern, Tom
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Automatic adjustments of NC programs in machining centers2018Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The goal of this master thesis was to automate the compensation of NC-programs. Automatic compensations can reduce errors and make the production more efficient. This is vital for increased precision and meeting the quality demands from the market.The project started with a study of how the feedback-loop between production and measurements was done at the time and also researching how the data could be sent between the different machines. This was done by researching solutions of similar problems and interviewing the machine operators. Simulations of how automation could be done with more in-depth measurements of the production machine were also made.The limitations was also evaluated. Research was done on errors and practical flaws which could be problematic for automation.The automation was implemented using Java to send the data between the measuring machine to the production machine. Furthermore a UI was created for the machine operators so that the information flow was under supervision at all times. The UI would suggest a compensation from a pre-programmed algorithm together with the measuring data, and the operator could then decide whether or not to diverge from the suggested compensation.

  • 208.
    Persson, Tobias
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Fredlund, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Motor control under strong vibrations2018Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
  • 209.
    Qiu, Lanxin
    et al.
    Beijing Univ Technol, Beijing, Peoples R China.
    Huang, Zhuangqin
    Beijing Univ Technol, Beijing, Peoples R China.
    Wirström, Niklas
    SICS Swedish ICT, Kista, Sweden.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. SICS Swedish ICT, Kista, Sweden.
    3DinSAR: Object 3D Localization for Indoor RFID Applications2016In: IEEE RFID, 2016, p. 191-198Conference paper (Refereed)
    Abstract [en]

    More and more objects can be identified and sensed with RFID tags. Existing schemes for 2D indoor localization have achieved impressing accuracy. In this paper we propose an accurate 3D localization scheme for objects. Our scheme leverages spatial domain phase difference to estimate the height of objects which is inspired by the phase-based Interferometric Synthetic Aperture Radar (InSAR) height determination theory. We further leverage a density-based spatial clustering method to choose the most likely position and show that it improves the accuracy. Our localization method does not need any reference tags. Only one antenna is required to move in a known way in order to construct the synthetic arrays to implement the locating system. We present experimental results from an indoor office environment with EPC C1G2 passive tags and a COTS RFID reader. Our 3D experiments demonstrate a spatial median error of 0.24 m. This novel 3D localization scheme is a simple, yet promising, solution. We believe that it is especially applicable for both portable readers and transport vehicles.

  • 210.
    Rademacher, Frans
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Larsson, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Lundberg, Oskar
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Electronics.
    Praktisk konstruktion av 8-bitarsdator2019Independent thesis Basic level (degree of Bachelor of Fine Arts), 10 credits / 15 HE creditsStudent thesis
    Abstract [sv]

    En 8-bitarsdator är i dagens samhälle gammal teknik. De kan knappast konkurrera med dagens moderna datorer som arbetar snabbare och med större tal. Genom att del för del ändå konstruera en 8-bitarsdator ges dock än idag stor insikt i hur datorer i allmänhet är konstruerade. Med bakgrundskunskap inom grundläggande digital elektronik kan enskilda moduler förstås, vilket sedan leder till en förståelse för datorn i stort. Detta projekt kretsade alltså kring att konstruera en 8-bitarsdator. Denna dator ska efter projektets slut kunna finnas kvar i syftet att användas i undervisning av digital elektronik. 8-bitarsdatorn innefattar flera moduler som var för sig kan både simuleras i mjukvara och konstrueras för sig. Därefter kunde alla moduler sättas samman. Datorn kan enkelt programmeras för att köra olika program, och kan med hjälp av så kallade flaggor hoppa i programkoden för att upprepa kod. Den resulterade datorn har vissa förbättringspotentialer, men fungerar väl enligt förväntningarna. Med strategiska val av färger på kablage och ett stort antal lysdioder blev datorn lättare att förstå och undersöka.

  • 211.
    Romeo, Luca
    et al.
    Univ Politecn Marche, Dept Informat Engn, Ancona, Italy;Fdn Ist Italiano Tecnol Genova, Cognit Mot & Neurosci & Computat Stat & Machine L, Genoa, Italy.
    Paolanti, Marina
    Univ Politecn Marche, Dept Informat Engn, Ancona, Italy.
    Bocchini, Gianluca
    Xelexia Srl, Pesaro, Italy.
    Loncarski, Jelena
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Electricity.
    Frontoni, Emanuele
    Univ Politecn Marche, Dept Informat Engn, Ancona, Italy.
    An Innovative Design Support System for Industry 4.0 Based on Machine Learning Approaches2018In: 2018 5TH INTERNATIONAL SYMPOSIUM ON ENVIRONMENT-FRIENDLY ENERGIES AND APPLICATIONS (EFEA) / [ed] Bruzzese, C Santini, E Digennaro, S, IEEE , 2018Conference paper (Refereed)
    Abstract [en]

    Electric machines together with power electronic converters are the major components in industrial and automotive applications. The frequent situation in the engineering practice is that designers, final or intermediate users have to roughly estimate some basic performance data or specification data or other metrics related to the specific task they have, on the basis of few data available at a particular instant of time or at the time of use. This paper addresses this problem in the Industry 4.0 scenario by introducing innovative Design support system (DesSS), originated from the Decision Support System (DSS), for the prediction and estimation of machine specification data such as machine geometry and machine design on the basis of other heterogeneous parameters (i.e. motor performance, field of application, geographic market, and range of cost). For the development of the DesSS different machine learning techniques were compared such as Decision/Regression Tree (DT/RT), Nearest Neighbors (NN), and Neighborhood Component Features Selection (NCFS). Experimental results obtained on the real use case demonstrated the appropriateness of the application of the machine learning approaches as the main core of the DesSS used for the estimation of the machine parameters. In particular, the results show high reliability in terms of accuracy and macro-F1 score of the 1-NN+NCFS and RT for solving respectively the classification and regression task. This approach can viably replace the model-based tools used for the parameters prediction, being it more accurate and with higher computational speed.

  • 212.
    Ros, Alberto
    et al.
    Univ Murcia, Dept Comp Engn, E-30001 Murcia, Spain.
    Davari, Mahdad
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies2015In: Proc. 21st International Symposium on High Performance Computer Architecture, IEEE Computer Society Digital Library, 2015, p. 186-197Conference paper (Refereed)
    Abstract [en]

    Hierarchical clustered cache designs are becoming an appealing alternative for multicores. Grouping cores and their caches in clusters reduces network congestion by localizing traffic among several hierarchical levels, potentially enabling much higher scalability. While such architectures can be formed recursively by replicating a base design pattern, keeping the whole hierarchy coherent requires more effort and consideration. The reason is that, in hierarchical coherence, even basic operations must be recursive. As a consequence, intermediate-level caches behave both as directories and as leaf caches. This leads to an explosion of states, protocol-races, and protocol complexity. While there have been previous efforts to extend directory-based coherence to hierarchical designs their increased complexity and verification cost is a serious impediment to their adoption. We aim to address these concerns by encapsulating all hierarchical complexity in a simple function: that of determining when a data block is shared entirely within a cluster (sub-tree of the hierarchy) and is private from the outside. This allows us to eliminate complex recursive operations that span the hierarchy and instead employ simple coherence mechanisms such as self-invalidation and write-through-now restricted to operate within the cluster where a data block is shared. We examine two inclusivity options and discuss the relation of our approach to the recently proposed Hierarchical-Race-Free (HRF) memory models. Finally, comparisons to a hierarchical directory-based MOESI, VIPS-M, and TokenCMP protocols show that, despite its simplicity our approach results in competitive performance and decreased network traffic.

  • 213. Ros, Alberto
    et al.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Non-Speculative Store Coalescing in Total Store Order2018In: Proc.45th International Symposium on Computer Architecture, IEEE, 2018, p. 221-234Conference paper (Refereed)
    Abstract [en]

    We present a non-speculative solution for a coalescing store buffer in total store order (TSO) consistency. Coalescing violates TSO with respect to both conflicting loads and conflicting stores, if partial state is exposed to the memory system. Proposed solutions for coalescing in TSO resort to speculation-and-rollback or centralized arbitration to guarantee atomicity for the set of stores whose order is affected by coalescing. These solutions can suffer from scalability, complexity, resource-conflict deadlock, and livelock problems. A non-speculative solution that writes out coalesced cachelines, one at a time, over a typical directory-based MESI coherence layer, has the potential to transcend these problems if it can guarantee absence of deadlock in a practical way. There are two major problems for a non-speculative coalescing store buffer: i) how to present to the memory system a group of coalesced writes as atomic, and ii) how to not deadlock while attempting to do so. For this, we introduce a new lexicographical order. Relying on this order, conflicting atomic groups of coalesced writes can be individually performed per cache block, without speculation, rollback, or replay, and without deadlock or livelock, yet appear atomic to conflicting parties and preserve TSO. One of our major contributions is to show that lexicographical orders based on a small part of the physical address (sub-address order) are deadlock-free throughout the system when taking into account resource-conflict deadlocks. Our approach exceeds the performance and energy benefits of two baseline TSO store buffers and matches the coalescing (and energy savings) of a release-consistency store buffer, at comparable cost.

  • 214.
    Ros, Alberto
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    The Superfluous Load Queue2018In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), IEEE, 2018, p. 95-107Conference paper (Refereed)
    Abstract [en]

    In an out-of-order core, the load queue (LQ), the store queue (SQ), and the store buffer (SB) are responsible for ensuring: i) correct forwarding of stores to loads and ii) correct ordering among loads (with respect to external stores). The first requirement safeguards the sequential semantics of program execution and applies to both serial and parallel code; the second requirement safeguards the semantics of coherence and consistency (e.g., TSO). In particular, loads search the SQ/SB for the latest value that may have been produced by a store, and stores and invalidations search the LQ to find speculative loads in case they violate uniprocessor or multiprocessor ordering. To meet timing constraints the LQ and SQ/SB system is composed of CAM structures that are frequently searched. This results in high complexity, cost, and significant difficulty to scale, but is the current state of the art. Prior research demonstrated the feasibility of a non-associative LQ by replaying loads at commit. There is a steep cost however: a significant increase in L1 accesses and contention for L1 ports. This is because prior work assumes Sequential Consistency and completely ignores the existence of a SB in the system. In contrast, we intentionally delay stores in the SB to achieve a total management of stores and loads in a core, while still supporting TSO. Our main result is that we eliminate the LQ without burdening the L1 with extra accesses. Store forwarding is achieved by delaying our own stores until speculatively issued loads are validated on commit, entirely in-core; TSO load -> load ordering is preserved by delaying remote external stores in their SB until our own speculative reordered loads commit. While the latter is inspired by recent work on non-speculative load reordering, our contribution here is to show that this can be accomplished without having a load queue. Eliminating the LQ results in both energy savings and performance improvement from the elimination of LQ-induced stalls.

  • 215. Rostampour, Vahab
    et al.
    Ferrari, Riccardo
    Teixeira, André M.H.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signals and Systems Group.
    Keviczky, Tamás
    Differentially-Private Distributed Fault Diagnosis for Large-Scale Nonlinear Uncertain Systems2018In: IFAC-PapersOnLine, ISSN 2405-8963, Vol. 51, no 24, p. 975-982Article in journal (Refereed)
    Abstract [en]

    Distributed fault diagnosis has been proposed as an effective technique for monitoring large scale, nonlinear and uncertain systems. It is based on the decomposition of the large scale system into a number of interconnected subsystems, each one monitored by a dedicated Local Fault Detector (LFD). Neighboring LFDs, in order to successfully account for subsystems interconnection, are thus required to communicate with each other some of the measurements from their subsystems. Anyway, such communication may expose private information of a given subsystem, such as its local input. To avoid this problem, we propose here to use differential privacy to pre-process data before transmission.

  • 216.
    Sanchez, Carlos
    et al.
    Florida State Univ, Tallahassee, FL 32306 USA.
    Gavin, Peter
    Florida State Univ, Tallahassee, FL 32306 USA.
    Moreau, Daniel
    Chalmers, Gothenburg, Sweden.
    Själander, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. NTNU, Trondheim, Norway.
    Whalley, David
    Florida State Univ, Tallahassee, FL 32306 USA.
    Larsson-Edefors, Per
    Chalmers, Gothenburg, Sweden.
    McKee, Sally A.
    Chalmers, Gothenburg, Sweden.
    Redesigning a tagless access buffer to require minimal ISA changes2016In: Proc. 19th International Conference on Compilers, Architectures and Synthesis for Embedded Systems, 2016, article id 19Conference paper (Refereed)
  • 217.
    Sandberg, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Efficient techniques for predicting cache sharing and throughput2012In: Proc. 21st International Conference on Parallel Architectures and Compilation Techniques, New York: ACM Press, 2012, p. 305-314Conference paper (Refereed)
    Abstract [en]

    This work addresses the modeling of shared cache contention in multicore systems and its impact on throughput and bandwidth. We develop two simple and fast cache sharing models for accurately predicting shared cache allocations for random and LRU caches.

    To accomplish this we use low-overhead input data that captures the behavior of applications running on real hardware as a function of their shared cache allocation. This data enables us to determine how much and how aggressively data is reused by an application depending on how much shared cache it receives. From this we can model how applications compete for cache space, their aggregate performance (throughput)¸ and bandwidth.

    We evaluate our models for two- and four-application workloads in simulation and on modern hardware. On a four-core machine, we demonstrate an average relative fetch ratio error of 6.7% for groups of four applications. We are able to predict workload bandwidth with an average relative error of less than 5.2% and throughput with an average error of less than 1.8%. The model can predict cache size with an average error of 1.3% compared to simulation.

  • 218.
    Sandberg, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Sembrant, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Modeling performance variation due to cache sharing2013In: Proc. 19th IEEE International Symposium on High Performance Computer Architecture, IEEE Computer Society, 2013, p. 155-166Conference paper (Refereed)
    Abstract [en]

    Shared cache contention can cause significant variability in the performance of co-running applications from run to run. This variability arises from different overlappings of the applications' phases, which can be the result of offsets in application start times or other delays in the system. Understanding this variability is important for generating an accurate view of the expected impact of cache contention. However, variability effects are typically ignored due to the high overhead of modeling or simulating the many executions needed to expose them.

    This paper introduces a method for efficiently investigating the performance variability due to cache contention. Our method relies on input data captured from native execution of applications running in isolation and a fast, phase-aware, cache sharing performance model. This allows us to assess the performance interactions and bandwidth demands of co-running applications by quickly evaluating hundreds of overlappings.

    We evaluate our method on a contemporary multicore machine and show that performance and bandwidth demands can vary significantly across runs of the same set of co-running applications. We show that our method can predict application slowdown with an average relative error of 0.41% (maximum 1.8%) as well as bandwidth consumption. Using our method, we can estimate an application pair's performance variation 213x faster, on average, than native execution.

  • 219.
    Seipel, Stefan
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Lingfors, David
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Physics.
    Widén, Joakim
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Solid State Physics.
    Dual-domain visual exploration of urban solar potential2013In: Proc. Eurographics Workshop on Urban Data Modelling and Visualisation, 2013Conference paper (Other academic)
  • 220.
    Sembrant, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hiding and Reducing Memory Latency: Energy-Efficient Pipeline and Memory System Techniques2016Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Memory accesses in modern processors are both far slower and vastly more energy-expensive than the actual computations. To improve performance, processors spend a significant amount of energy and resources trying to hide and reduce the memory latency. To hide the latency, processors use out-order-order execution to overlap memory accesses with independent work and aggressive speculative instruction scheduling to execute dependent instructions back-to-back. To reduce the latency, processors use several levels of caching that keep frequently used data closer to the processor. However, these optimizations are not for free. Out-of-order execution requires expensive processor resources, and speculative scheduling must re-execute instructions on incorrect speculations, and multi-level caching requires extra energy and latency to search the cache hierarchy. This thesis investigates several energy-efficient techniques for: 1) hiding the latency in the processor pipeline, and 2) reducing the latency in the memory hierarchy.

    Much of the inefficiencies of hiding latency in the processor come from two sources. First, processors need several large and expensive structures to do out-of-order execution (instructions queue, register file, etc.). These resources are typically allocated in program order, effectively giving all instructions equal priority. To reduce the size of these expensive resources without hurting performance, we propose Long Term Parking (LTP). LTP parks non-critical instructions before they allocate resources, thereby making room for critical memory accessing instructions to continue and expose more memory-level parallelism. This enables us to save energy by shrinking the resources sizes without hurting performance. Second, when a load's data returns, the load's dependent instructions need to be scheduled and executed. To execute the dependent instructions back-to-back, the processor will speculatively schedule instructions before the processor knows if the input data will be available at execution time. To save energy, we investigate different scheduling techniques that reduce the number of re-executions due to misspeculation.

    The inefficiencies of traditional memory hierarchies come from the need to do level-by-level searches to locate data. The search starts at the L1 cache, then proceeds level by level until the data is found, or determined not to be in any cache, at which point the processor has to fetch the data from main memory. This wastes time and energy for every level that is searched. To reduce the latency, we propose tracking the location of the data directly in a separate metadata hierarchy. This allows us to directly access the data without needing to search. The processor simply queries the metadata hierarchy for the location information about where the data is stored. Separating metadata into its own hierarchy brings a wide range of additional benefits, including flexibility in how we place data storages in the hierarchy, the ability to intelligently store data in the hierarchy, direct access to remote cores, and many other data-oriented optimizations that can leverage our precise knowledge of where data are located.

    List of papers
    1. Long Term Parking (LTP): Criticality-aware Resource Allocation in OOO Processors
    Open this publication in new window or tab >>Long Term Parking (LTP): Criticality-aware Resource Allocation in OOO Processors
    Show others...
    2015 (English)In: Proc. 48th International Symposium on Microarchitecture, 2015Conference paper, Published paper (Refereed)
    Abstract [en]

    Modern processors employ large structures (IQ, LSQ, register file, etc.) to expose instruction-level parallelism (ILP) and memory-level parallelism (MLP). These resources are typically allocated to instructions in program order. This wastes resources by allocating resources to instructions that are not yet ready to be executed and by eagerly allocating resources to instructions that are not part of the application’s critical path.

    This work explores the possibility of allocating pipeline resources only when needed to expose MLP, and thereby enabling a processor design with significantly smaller structures, without sacrificing performance. First we identify the classes of instructions that should not reserve resources in program order and evaluate the potential performance gains we could achieve by delaying their allocations. We then use this information to “park” such instructions in a simpler, and therefore more efficient, Long Term Parking (LTP) structure. The LTP stores instructions until they are ready to execute, without allocating pipeline resources, and thereby keeps the pipeline available for instructions that can generate further MLP.

    LTP can accurately and rapidly identify which instructions to park, park them before they execute, wake them when needed to preserve performance, and do so using a simple queue instead of a complex IQ. We show that even a very simple queue-based LTP design allows us to significantly reduce IQ (64 →32) and register file (128→96) sizes while retaining MLP performance and improving energy efficiency.

    National Category
    Computer Engineering
    Identifiers
    urn:nbn:se:uu:diva-272468 (URN)
    Conference
    MICRO 2015, December 5–9, Waikiki, HI
    Projects
    UPMARCUART
    Available from: 2016-01-14 Created: 2016-01-14 Last updated: 2018-01-10
    2. Cost-effective speculative scheduling in high performance processors
    Open this publication in new window or tab >>Cost-effective speculative scheduling in high performance processors
    Show others...
    2015 (English)In: Proc. 42nd International Symposium on Computer Architecture, New York: ACM Press, 2015, p. 247-259Conference paper, Published paper (Refereed)
    Abstract [en]

    To maximize performance, out-of-order execution processors sometimes issue instructions without having the guarantee that operands will be available in time; e.g. loads are typically assumed to hit in the L1 cache and dependent instructions are issued accordingly. This form of speculation - that we refer to as speculative scheduling - has been used for two decades in real processors, but has received little attention from the research community. In particular, as pipeline depth grows, and the distance between the Issue and the Execute stages increases, it becomes critical to issue instructions dependent on variable-latency instructions as soon as possible rather than wait for the actual cycle at which the result becomes available. Unfortunately, due to the uncertain nature of speculative scheduling, the scheduler may wrongly issue an instruction that will not have its source(s) available on the bypass network when it reaches the Execute stage. In that event, the instruction is canceled and replayed, potentially impairing performance and increasing energy consumption. In this work, we do not present a new replay mechanism. Rather, we focus on ways to reduce the number of replays that are agnostic of the replay scheme. First, we propose an easily implementable, low-cost solution to reduce the number of replays caused by L1 bank conflicts. Schedule shifting always assumes that, given a dual-load issue capacity, the second load issued in a given cycle will be delayed because of a bank conflict. Its dependents are thus always issued with the corresponding delay. Second, we also improve on existing L1 hit/miss prediction schemes by taking into account instruction criticality. That is, for some criterion of criticality and for loads whose hit/miss behavior is hard to predict, we show that it is more cost-effective to stall dependents if the load is not predicted critical.

    Place, publisher, year, edition, pages
    New York: ACM Press, 2015
    National Category
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-272467 (URN)10.1145/2749469.2749470 (DOI)000380455700020 ()9781450334020 (ISBN)
    Conference
    ISCA 2015, June 13–17, Portland, OR
    Projects
    UPMARCUART
    Available from: 2015-06-13 Created: 2016-01-14 Last updated: 2016-12-05Bibliographically approved
    3. TLC: A tag-less cache for reducing dynamic first level cache energy
    Open this publication in new window or tab >>TLC: A tag-less cache for reducing dynamic first level cache energy
    2013 (English)In: Proceedings of the 46th International Symposium on Microarchitecture, New York: ACM Press, 2013, p. 49-61Conference paper, Published paper (Refereed)
    Abstract [en]

    First level caches are performance-critical and are therefore optimized for speed. To do so, modern processors reduce the miss ratio by using set-associative caches and optimize latency by reading all ways in parallel with the TLB and tag lookup. However, this wastes energy since only data from one way is actually used.

    To reduce energy, phased-caches and way-prediction techniques have been proposed wherein only data of the matching/predicted way is read. These optimizations increase latency and complexity, making them less attractive for first level caches.

    Instead of adding new functionality on top of a traditional cache, we propose a new cache design that adds way index information to the TLB. This allow us to: 1) eliminate ex-tra data array reads (by reading the right way directly), 2) avoid tag comparisons (by eliminating the tag array), 3) later out misses (by checking the TLB), and 4) amortize the TLB lookup energy (by integrating it with the way information). In addition, the new cache can directly replace existing caches without any modication to the processor core or software.

    This new Tag-Less Cache (TLC) reduces the dynamic energy for a 32 kB, 8-way cache by 60% compared to a VIPT cache without aecting performance.

    Place, publisher, year, edition, pages
    New York: ACM Press, 2013
    National Category
    Computer Engineering Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-213236 (URN)10.1145/2540708.2540714 (DOI)978-1-4503-2638-4 (ISBN)
    Conference
    MICRO-46; December 7-11, 2013; Davis, CA, USA
    Projects
    UPMARCCoDeR-MP
    Available from: 2013-12-07 Created: 2013-12-19 Last updated: 2018-01-11Bibliographically approved
    4. The Direct-to-Data (D2D) Cache: Navigating the cache hierarchy with a single lookup
    Open this publication in new window or tab >>The Direct-to-Data (D2D) Cache: Navigating the cache hierarchy with a single lookup
    2014 (English)In: Proc. 41st International Symposium on Computer Architecture, Piscataway, NJ: IEEE Press, 2014, p. 133-144Conference paper, Published paper (Refereed)
    Abstract [en]

    Modern processors optimize for cache energy and performance by employing multiple levels of caching that address bandwidth, low-latency and high-capacity. A request typically traverses the cache hierarchy, level by level, until the data is found, thereby wasting time and energy in each level. In this paper, we present the Direct-to-Data (D2D) cache that locates data across the entire cache hierarchy with a single lookup.

    To navigate the cache hierarchy, D2D extends the TLB with per cache-line location information that indicates in which cache and way the cache line is located. This allows the D2D cache to: 1) skip levels in the hierarchy (by accessing the right cache level directly), 2) eliminate extra data array reads (by reading the right way directly), 3) avoid tag comparisons (by eliminating the tag arrays), and 4) go directly to DRAM on cache misses (by checking the TLB). This reduces the L2 latency by 40% and saves 5-17% of the total cache hierarchy energy.

    D2D´s lower L2 latency directly improves L2 sensitive applications´ performance by 5-14%. More significantly, we can take advantage of the L2 latency reduction to optimize other parts of the microarchitecture. For example, we can reduce the ROB size for the L2 bound applications by 25%, or we can reduce the L1 cache size, delivering an overall 21% energy savings across all benchmarks, without hurting performance.

    Place, publisher, year, edition, pages
    Piscataway, NJ: IEEE Press, 2014
    National Category
    Computer Engineering Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-235362 (URN)10.1145/2678373.2665694 (DOI)000343652800012 ()978-1-4799-4394-4 (ISBN)
    Conference
    ISCA 2014, June 14–18, Minneapolis, MN
    Projects
    UPMARCCoDeR-MP
    Available from: 2014-06-14 Created: 2014-10-31 Last updated: 2018-01-11Bibliographically approved
    5. A split cache hierarchy for enabling data-oriented optimizations
    Open this publication in new window or tab >>A split cache hierarchy for enabling data-oriented optimizations
    2017 (English)In: Proc. 23rd International Symposium on High Performance Computer Architecture, IEEE Computer Society, 2017, p. 133-144Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    IEEE Computer Society, 2017
    National Category
    Computer Engineering
    Identifiers
    urn:nbn:se:uu:diva-306368 (URN)10.1109/HPCA.2017.25 (DOI)000403330300012 ()978-1-5090-4985-1 (ISBN)
    Conference
    HPCA 2017, February 4–8, Austin, TX
    Projects
    UPMARC
    Available from: 2017-05-08 Created: 2016-10-27 Last updated: 2019-03-08
  • 221.
    Sembrant, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Phase Behavior in Serial and Parallel Applications2012In: International Symposium on Workload Characterization (IISWC'12), IEEE Computer Society, 2012Conference paper (Refereed)
  • 222.
    Sembrant, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Phase Guided Profiling for Fast Cache Modeling2012In: International Symposium on Code Generation and Optimization (CGO'12), ACM Press, 2012, p. 175-185Conference paper (Refereed)
    Abstract [en]

    Statistical cache models are powerful tools for understanding application behavior as a function of cache allocation. However, previous techniques have modeled only the average application behavior, which hides the effect of program variations over time. Without detailed time-based information, transient behavior, such as exceeding bandwidth or cache capacity, may be missed. Yet these events, while short, often play a disproportionate role and are critical to understanding program behavior.

    In this work we extend earlier techniques to incorporate program phase information when collecting runtime profiling data. This allows us to model an application's cache miss ratio as a function of its cache allocation over time. To reduce overhead and improve accuracy we use online phase detection and phase-guided profiling. The phase-guided profiling reduces overhead by more intelligently selecting portions of the application to sample, while accuracy is improved by combining samples from different instances of the same phase.

    The result is a new technique that accurately models the time-varying behavior of an application's miss ratio as a function of its cache allocation on modern hardware. By leveraging phase-guided profiling, this work both improves on the accuracy of previous techniques and reduces the overhead.

  • 223.
    Sembrant, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Carlson, Trevor E.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    POSTER: Putting the G back into GPU/CPU Systems Research2017In: 2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2017, p. 130-131Conference paper (Refereed)
    Abstract [en]

    Modern SoCs contain several CPU cores and many GPU cores to execute both general purpose and highly-parallel graphics workloads. In many SoCs, more area is dedicated to graphics than to general purpose compute. Despite this, the micro-architecture research community primarily focuses on GPGPU and CPU-only research, and not on graphics (the primary workload for many SoCs). The main reason for this is the lack of efficient tools and simulators for modern graphics applications. This work focuses on the GPU's memory traffic generated by graphics. We describe a new graphics tracing framework and use it to both study graphics applications' memory behavior as well as how CPUs and GPUs affect system performance. Our results show that graphics applications exhibit a wide range of memory behavior between applications and across time, and slows down co-running SPEC applications by 59% on average.

  • 224.
    Sembrant, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Eklöv, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Efficient software-based online phase classification2011In: International Symposium on Workload Characterization (IISWC'11), IEEE Computer Society, 2011, p. 104-115Conference paper (Refereed)
    Abstract [en]

    Many programs exhibit execution phases with time-varying behavior. Phase detection has been used extensively to find short and representative simulation points, used to quickly get representative simulation results for long-running applications. Several proposals for hardware-assisted phase detection have also been proposed to guide various forms of optimizations and hardware configurations. This paper explores the feasibility of low overhead phase detection at runtime based entirely on existing features found in modern processors. If successful, such a technology would be useful for cache management, frequency adjustments, runtime scheduling and profiling techniques. The paper evaluates several existing and new alternatives for efficient runtime data collection and online phase detection. ScarPhase (Sample-based Classification and Analysis for Runtime Phases), a new online phase detection library, is presented. It makes extensive usage of the new hardware counter features, introduces a new phase classification heuristic and suggests a way to dynamically adjust the sample rate. ScarPhase exhibits runtime overhead below 2%.

  • 225.
    Sembrant, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    TLC: A tag-less cache for reducing dynamic first level cache energy2013In: Proceedings of the 46th International Symposium on Microarchitecture, New York: ACM Press, 2013, p. 49-61Conference paper (Refereed)
    Abstract [en]

    First level caches are performance-critical and are therefore optimized for speed. To do so, modern processors reduce the miss ratio by using set-associative caches and optimize latency by reading all ways in parallel with the TLB and tag lookup. However, this wastes energy since only data from one way is actually used.

    To reduce energy, phased-caches and way-prediction techniques have been proposed wherein only data of the matching/predicted way is read. These optimizations increase latency and complexity, making them less attractive for first level caches.

    Instead of adding new functionality on top of a traditional cache, we propose a new cache design that adds way index information to the TLB. This allow us to: 1) eliminate ex-tra data array reads (by reading the right way directly), 2) avoid tag comparisons (by eliminating the tag array), 3) later out misses (by checking the TLB), and 4) amortize the TLB lookup energy (by integrating it with the way information). In addition, the new cache can directly replace existing caches without any modication to the processor core or software.

    This new Tag-Less Cache (TLC) reduces the dynamic energy for a 32 kB, 8-way cache by 60% compared to a VIPT cache without aecting performance.

  • 226.
    Själander, Magnus
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Borgström, Gustaf
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Improving Error-Resilience of Emerging Multi-Value TechnologiesManuscript (preprint) (Other academic)
    Abstract [en]

    There exist extensive ongoing research efforts on emerging technologies that have the potential to become an alternative to today’s CMOS technologies. A common feature among the investigated technologies is that of multi- value devices and the possibility of implementing quaternary logic and memory. However, multi-value devices tend to be more sensitive to interferences and, thus, have reduced error resilience. We present an architecture based on multi-value devices where we can trade energy efficiency against error resilience. Important data are encoded in a more robust binary format while error tolerant data is encoded in a quaternary format. We show for eight benchmarks an energy reduction of 32% and 36% for the register file and level-one data cache, respectively, and for the two integer benchmarks, an energy reduction for arithmetic operations of 13% and 23%. We also show that for a quaternary technology to be viable it need to have a raw bit error rate of one error in 100 million or better.

  • 227.
    Själander, Magnus
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Borgström, Gustaf
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Klymenko, Mykhailo V.
    Remacle, Françoise
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Techniques for modulating error resilience in emerging multi-value technologies2016In: Proc. 13th International Conference on Computing Frontiers, New York: ACM Press, 2016, p. 55-63Conference paper (Refereed)
  • 228.
    Själander, Magnus
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Martonosi, Margaret
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Power-Efficient Computer Architectures: Recent Advances2014Book (Refereed)
  • 229.
    Soleiman, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Battery-free Visible Light Sensing2019Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    In this thesis, we show that it is possible to design a battery-free light sensing system that can sense and communicate hand gestures while operating fully on harvested power from indoor light. We present two main innovations that push our system to tens of microwatts of power to enable battery-free operation. First, we introduce a novel visible light sensing system that can track variations in light intensity by using a solar cell as a sensor. Solar cells are unlike photodiodes optimized for energy yield in the visible light region and hence do not require any power hungry active components such as an operational amplifier. Furthermore, solar cells can operate under more diverse light conditions as they are not susceptible to saturation under bright light. Second, we devise two ultra-low power communication mechanisms based on radio frequency backscatter to transmit sensor readings at various resolutions without the need of any energy-expensive computational blocks.  We design two battery-free and self-powered hardware prototypes that are based on these two innovations. Our first design utilizes an on-board comparator based circuit to perform a 1-bit digitization of changes in light readings, consuming only sub-microwatt of power for digitization. For our second prototype, we design an analog backscatter mechanism that can map raw sensor readings directly to backscatter transmissions. We demonstrate the feasibility of our designs when sensing significant changes in light intensity caused by shadows from hand gestures, and reconstruct these at a receiving device. Our results demonstrate the ability to sense and communicate various hand gestures at a peak power of 20 microwatts when performing 1-bit digitization, and a mean power of 60 microwatts when performing analog backscatter. Both designs represent orders of magnitude improvement in terms of power consumption over state-of-the-art visible light sensing systems.

  • 230.
    Spiliopoulos, Vasileios
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Bagdia, Akash
    Hansson, Andreas
    Aldworth, Peter
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Introducing DVFS-Management in a Full-System Simulator2013In: Proc. 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, IEEE Computer Society, 2013Conference paper (Refereed)
    Abstract [en]

    Dynamic Voltage and Frequency Scaling (DVFS) is an essential part of controlling the power consumption of any computer system, ranging from mobile phones to servers. DVFS efficiency relies on hardware-software co-optimization, thus using existing hardware cannot reveal the full optimization potential beyond the current implementation’s characteristics. To explore the vast design space for DVFS efficiency, that straddles software and hardware, a simulation infrastructure must provide features that are not readily available today, for example: software controllable clock and voltage domains, support for the OS and the frequency scaling module of it, and an online power estimation methodology. As the main contribution,this work enables DVFS studies in a full-system simulator. We extend the gem5 simulator to support full-system DVFS modeling. By doing so, we enable energy-efficiency experiments to be performed in gem5 and we showcase such studies. Finally, we show that both existing and novel frequency governors for Linux and Android can be effortlessly integrated in the framework, and we evaluate the efficiency of different DVFS schemes.

  • 231.
    Spiliopoulos, Vasileios
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Sembrant, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Power-Sleuth: A Tool for Investigating your Program's Power Behavior2012In: International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'12), 2012, p. 241-250Conference paper (Refereed)
    Abstract [en]

    Modern processors support aggressive power saving techniques to reduce energy consumption. However, traditional profiling techniques have mainly focused on performance, which does not accurately reflect the power behavior of applications. For example, the longest running function is not always the most energy-hungry function. Thus software developers cannot always take full advantage of these power-saving features.

    We present \powersleuth, a power/performance estimation tool which is able to provide a full description of an application's behavior for any frequency from a single profiling run. The tool combines three techniques: a power and a performance estimation model with a program phase detection technique to deliver accurate, per-phase, per-frequency analysis.

    Our evaluation (against real power measurements) shows that we can accurately predict power and performance across different frequencies with average errors of 3.5% and 3.9% respectively.

  • 232.
    Sun, Jinghao
    et al.
    Hong Kong Polytech Univ, Hong Kong, Hong Kong, Peoples R China.;Northeastern Univ, Shenyang, Liaoning, Peoples R China..
    Guan, Nan
    Hong Kong Polytech Univ, Hong Kong, Hong Kong, Peoples R China..
    Wang, Yang
    Northeastern Univ, Shenyang, Liaoning, Peoples R China..
    He, Qingqiang
    Hong Kong Polytech Univ, Hong Kong, Hong Kong, Peoples R China..
    Wang, Yi
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Northeastern Univ, Shenyang, Liaoning, Peoples R China..
    Real-Time Scheduling and Analysis of OpenMP Task Systems with Tied Tasks2017In: 2017 IEEE Real-Time Systems Symposium (RTSS), IEEE, 2017, p. 92-103Conference paper (Refereed)
    Abstract [en]

    OpenMP is a promising framework for developing parallel real-time software on multi-cores. Although similar to the DAG task model, OpenMP task systems are significantly more difficult to analyze due to constraints posed by the OpenMP specification. An important feature in OpenMP is tied tasks, which must execute on the same thread during the whole life cycle. Although tied tasks enjoy benefits in simplicity and efficiency, it was considered to be not suitable to real-time systems due to its complex behavior. In this paper, we study the real-time scheduling and analysis of OpenMP task systems with tied tasks. First, we show that under the existing scheduling algorithms in OpenMP, tied tasks indeed may lead to extremely bad timing behaviors where the parallel workload is sequentially executed completely. To solve this problem, we proposed a new scheduling algorithm and developed two response time bounds for it, with different trade-off between simplicity and analysis precision. Experiments with both randomly generated OpenMP task systems and realistic OpenMP programs show that the response time bounds obtained by our approach for tied task systems are very close to that of untied tasks.

  • 233.
    Sveinsson, Ólafur Björgvin
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signals and Systems Group.
    Measurement setup for High Power Impulse Magnetron Sputtering2011Independent thesis Basic level (professional degree), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    Recently material physics group at Science Institute of University of Iceland has been using reactive sputtering to grow thin films used in various research projects at the institute. These films have been grown using dc sputtering which has been proven a very successful method. High power impulse magnetron sputtering or HiPIMS is an new pulsed power sputtering method where shorter but high power pulses are used to sputter over lower steady power.

    The project resulted in a functional system capable of growing thin films using HiPIMS. Thin films grown with high power pulses have a higher film density and other more preferable properties compared to films grown using direct current magnetron sputtering.

  • 234.
    Svetoft, John
    Gotland University, School of Game Design, Technology and Learning Processes.
    Super-modular Textures: Comparisons in Practical Applications2012Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    The purpose of this thesis is to compare the products of two different work flows in a practical environment. The work flows in question are those of building environments using super-modular textures versus the more conventional method of unique textures. The tests will try to establish whether or not the theory of the super-modular work flow holds up in practice, i.e. if the result is actually as optimized has the theory outlines. The data of interest is gathered through tests in the Unreal Development Kit, which is a free version of the commercial Unreal engine. Results are compiled into graphs in order to give a clear overview of the differences between the two products. The results show that in the context of the tests performed the super-modular workflow allocates less memory than what can be achieved using uniquely mapped assets and that the average draw calls remain the same regardless of method.

  • 235.
    Tachihara, S
    et al.
    Tokyo University of Science.
    Yamaguchi, T
    Tokyo University of Science.
    Ishiura, N
    Harada, T
    DINOMAIS, M
    RICHARD, P
    NGUYEN, S
    Bachelder, Steven
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Hayashi, Masaki
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Nakajima, Masayuki
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    FORS, U
    Characterization of engagement changes during VR based rehabilitation: A preliminary study2016Conference paper (Other academic)
  • 236.
    Takeuchi, Kohei
    et al.
    Osaka Inst Technol, Fac Informat Sci & Technol, Osaka, Japan.
    Hayashi, Masaki
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Hirayama, Makoto J.
    Osaka Inst Technol, Fac Informat Sci & Technol, Osaka, Japan.
    Development of VR museum and a comparison with the screen-based virtual musuem2019In: INTERNATIONAL WORKSHOP ON ADVANCED IMAGE TECHNOLOGY (IWAIT) 2019 / [ed] Kemao, Q Hayase, K Lau, PY Lie, WN Lee, YL Srisuk, S Yu, L, SPIE-INT SOC OPTICAL ENGINEERING , 2019, article id 110491BConference paper (Refereed)
    Abstract [en]

    We have been researching and developing virtual museums that can enable a user to appreciate artworks with much reality and less stress as if one is experiencing in actual museums. For this time, as a method to further increase the reality of the virtual museum, we developed a virtual museum using Head Mounted Display. We used Oculus Rift and Unity Game Engine to develop this VR museum. We also performed a comparison of the developed VR museum and the screen-based virtual museum in order to maximize the user experience of the virtual museum. Four aspects of "Freedom of operation". "Immersion", "Comfort of play", and "Picture quality" are examined to clarify which type of virtual museum would be suitable for any specific user's needs.

  • 237.
    tariq, tariq
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media, Information Systems.
    GUI Application for measuring instrument.: Noise measurement system.2013Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The always growing demands on the electronics design of modern electron microscopes cause increased requirements to the measurement tasks in the electronics development of these systems. In this thesis, we report the findings of designing noise measurements setup in Carl-Zeiss, Oberkochen. The aim of this thesis was to explore the design setup for noise measurement and to provide an interface which help us analyze these measurements using C# and agilent multimeter. This was achieved by the construction and evaluation of a prototype for a noise measurment application. For this purpose Design Science Research (DSR) was conducted, situated in the domain of noise measurement research. The results consist of a set of design principles expressing key aspects needed to address when designing noise measurement functionality. The artifacts derived from the development and evaluation process each one constitutes an example of how to design for noise measurement functionality of this kind.

  • 238.
    Theodoro, Thainan S.
    et al.
    Federal University of Juiz de Fora, Department of Electrical Engineering, Brazil.
    Tomim, Marcelo A.
    Federal University of Juiz de Fora, Department of Electrical Engineering, Brazil.
    Barbosa, Pedro G.
    Federal University of Juiz de Fora, Department of Electrical Engineering, Brazil.
    Lima, Antonio C.S.
    Federal University of Rio de Janeiro, Department of Electrical Engineering, Brazil.
    de Santiago Ochoa, Juan
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Electricity.
    A Hybrid Simulation Tool for Distributed Generation Integration Studies2018In: 2018 Power Systems Computation Conference (PSCC) / [ed] IEEE, Dublin, Ireland: IEEE conference proceedings, 2018Conference paper (Refereed)
    Abstract [en]

    This work presents a hybrid simulation tool thatcombines fast analysis of quasi-static time series or transient stability programs and electromagnetic transients ones to evaluatethe dynamic behaviour of electric power systems with distributedgeneration sources. The interaction between the two programs isperformed by means of controllable current and voltage sources,which are used to interface external and detailed systems. Thedouble second order generalized integrator (DSOGI) is used toextract the positive-sequence phasor from the detailed systemto the external one. A local network server controls the datacommunication between the two simulation environments bymeans of the TCP/IP protocol. In the present paper, the proposedtool is used to simulate the integration of a wind power plant,based on a doubly-fed induction generator (DFIG), into a 29-bus electrical network. Results and computational timings arethen compared with the ones obtained with an electromagnetictransients program, which demonstrate the accuracy and speedof the proposed strategy.

  • 239.
    Tobias, Eklund
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media.
    Spehar, Joakim
    CPlanner: Kursplaneringsprototyp med Design Science och Scrum2013Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    Development of planning system is a complex design problem that requires both a high degree of flexibility but also structure. In the context of planning, there are several actors, activities and resources that must be considered. Expertise in planning is often concentrated in a few key individuals. It is therefore no coincidence that many businesses, organizations and even universities currently conducts its planning in proven single-user system like Excel even though there is a strong need for standardized multi-user system. Uppsala University is no exception, despite its size, with over 40,000 students, 6,200 employees, 130 programs and 2000 courses. Course planning is conducted using single-user system and which is dependent on a number of key individuals to plan to work. The essay aims to investigate and illustrate the problems that are associated with the development of the planning system by developing a prototype of a course scheduling system. The research strategy used is Design Science and the development methodology that is used is Scrum. The prototype has been evaluated regularly during development through formative evaluation. The essays knowledge contribution is methodological knowledge that shows both how Scrum and Design Science can be combined and model knowledge, which shows the basic structure of a course scheduling system.

  • 240.
    Tsuruta, Naoya
    et al.
    Tokyo Univ Technol, Sch Media Sci, Tokyo, Japan.
    Teraoka, Takehiro
    Tokyo Univ Technol, Sch Media Sci, Tokyo, Japan.
    Kondo, Kunio
    Tokyo Univ Technol, Sch Media Sci, Tokyo, Japan.
    Hayashi, Masaki
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    TV Show Template for Text Generated TV2018In: 2018 International Workshop on Advanced Image Technology (IWAIT), IEEE , 2018Conference paper (Refereed)
    Abstract [en]

    The technique which converts text-based document into a TV show like computer graphics animation has been developed. However, writing a scripting language requires a variety of knowledge on TV shows such as speech timing, camera angle, lighting and so on. In this paper, we propose TV show template that contains animation settings imitating a particular type of TV show and semi-automatic animation generation using it.

  • 241.
    Varshney, Ambuj
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Enabling Sustainable Networked Embedded Systems2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Networked Embedded Systems (NES) are small energy-constrained devices typically with sensors, radio and some form of energy storage. The past several years have seen a rapid growth of applications of NES, with several predictions stating billions of devices deployed in the near future. As NES are deployed at large scale, a growing challenge is to support NES for long periods of time without negatively impacting their physical or the radio environment, i.e., in a sustainable manner. In this dissertation, we identify intertwined challenges that affect the sustainability of NES systems: co-existence on the shared wireless spectrum; energy consumption; and the cost of the deployment and maintenance. We identify research directions to overcome these challenges and address them through the six research papers.

    Firstly, NES have to co-exist with other wireless devices that operate on the shared wireless spectrum. A growing number of devices contending for the spectrum is challenging and leads to increased interference among them. To enable NES to co-exist with other wireless devices, we investigate the use of electronically steerable directional antennas (ESD). ESD antennas allow software-based control of the direction of maximum antenna gain on a per-packet basis and can operate within the severe energy constraints of NES. In the dissertation, we demonstrate that ESD antennas allow solutions that outperform the state-of-the-art in sensing and communication in wireless sensor networks while supporting operations on a single wireless channel reducing the contention on the shared wireless spectrum.

    Secondly, we explore the emerging area of visible light sensing and communication to avoid the crowded radio frequency spectrum. Visible light can be an alternative or a complement to radio frequency for sensing and communication. We make two contributions in the dissertation to make the visible light communication a viable option for NES. We design a novel visible light sensing architecture that supports sensing and communication at tens of microwatts of power. An ultra-low power consumption can make visible light sensing systems pervasive. Our second contribution brings high-speed visible light communication to energy-constrained NES. We design a novel visible light receiver that adapts to the dynamics of changing light conditions, and the energy constraints of the host device while supporting a throughput comparable to radio frequency standards for NES. Through our contribution, we take a significant step to enable visible light-based sustainable NES.

    Finally, replacing batteries on sensor nodes significantly affects the sustainability of NES. Battery-free sensors that harvest small amounts of energy from the ambient environment have a great potential to enable pervasive deployment of NES. To support wide-area deployments of battery-free sensors, we develop an ultra-low power and long-range communication mechanism. We demonstrate the ability to communicate to distances as long as a few kilometres while consuming tens of microwatts at the sensor device. Our contributions pave the way for a wide-area deployment of battery-free sustainable NES.

    Through the contributions made in the dissertation, we take a significant step towards the broader goal of sustainable NES. The work included in the dissertation significantly improves the state-of-the-art in NES, in some case by orders of magnitude.

    List of papers
    1. Directional Transmissions and Receptions for High-throughput Bulk Forwarding in Wireless Sensor Networks
    Open this publication in new window or tab >>Directional Transmissions and Receptions for High-throughput Bulk Forwarding in Wireless Sensor Networks
    2015 (English)In: Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, 2015, p. 351-364Conference paper, Published paper (Refereed)
    Abstract [en]

    We present DPT: a wireless sensor network protocol for bulk traffic that uniquely leverages electronically switchable directional (ESD) antennas. Bulk traffic is found in several scenarios and supporting protocols based on standard antenna technology abound. ESD antennas may improve performance in these scenarios; for example, by reducing channel contention as the antenna can steer the radiated energy only towards the intended receivers, and by extending the communication range at no additional energy cost. The corresponding protocol support, however, is largely missing. DPT addresses precisely this issue. First, while the network is quiescent, we collect link metrics across all possible antenna configurations. We use this information to formulate a constraint satisfaction problem (CSP) that allows us to find two multi-hop disjoint paths connecting source and sink, along with the corresponding antenna configurations. Domain-specific heuristics we conceive ameliorate the processing demands in solving the CSP, improving scalability. Second, the routing configuration we obtain is injected back into the network. During the actual bulk transfer, the source funnels data through the two paths by quickly alternating between them. Packet forwarding occurs deterministically at every hop. This allows the source to implicitly "clock" the entire pipeline, sparing the need of proactively synchronizing the transmissions across the two paths. Our results, obtained in a real testbed using 802.15.4-compliant radios and custom ESD antennas we built, indicate that DPT approaches the maximum throughput supported by the link layer, peaking at 214 kbit/s in the settings we test.

    Keywords
    Directional antennas; Bulk data transmissions; Wireless sensor networks; Electronically controlled antennas
    National Category
    Computer Systems
    Research subject
    Computer Science with specialization in Computer Communication
    Identifiers
    urn:nbn:se:uu:diva-266348 (URN)10.1145/2809695.2809720 (DOI)000380612400028 ()9781450336314 (ISBN)
    Conference
    The 13th ACM Conference on Embedded Networked Sensor Systems (SenSys 2015), November 1-4, 2015, Seoul, South Korea
    Available from: 2015-11-08 Created: 2015-11-08 Last updated: 2018-03-16Bibliographically approved
    2. dRTI: Directional Radio Tomographic Imaging
    Open this publication in new window or tab >>dRTI: Directional Radio Tomographic Imaging
    Show others...
    2015 (English)In: Proceedings of the 14th International Conference on Information Processing in Sensor Networks, 2015, p. 166-177Conference paper, Published paper (Refereed)
    Abstract [en]

    Radio tomographic imaging (RTI) enables device free localisation of people and objects in many challenging environments and situations. Its basic principle is to detect the changes in the statistics of radio signals due to the radio link obstruction by people or objects. However, the localisation accuracy of RTI suffers from complicated multipath propagation behaviours in radio links. We propose to use inexpensive and energy efficient electronically switched directional (ESD) antennas to improve the quality of radio link behaviour observations, and therefore, the localisation accuracy of RTI. We implement a directional RTI (dRTI) system to understand how directional antennas can be used to improve RTI localisation accuracy. We also study the impact of the choice of antenna directions on the localisation accuracy of dRTI and propose methods to effectively choose informative antenna directions to improve localisation accuracy while reducing overhead. Furthermore, we analyse radio link obstruction performance in both theory and simulation, as well as false positives and false negatives of the obstruction measurements to show the superiority of the directional communication for RTI. We evaluate the performance of dRTI in diverse indoor environments and show that dRTI significantly outperforms the existing RTI localisation methods based on omni-directional antennas.

    Series
    IPSN ’15
    National Category
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-252426 (URN)10.1145/2737095.2737118 (DOI)
    Conference
    ACM/IEEE IPSN 2015
    Available from: 2015-05-06 Created: 2015-05-06 Last updated: 2018-03-16
    3. LoRea: A backscatter architecture that achieves a long communication range
    Open this publication in new window or tab >>LoRea: A backscatter architecture that achieves a long communication range
    Show others...
    2017 (English)In: Proc. 15th ACM Conference on Embedded Network Sensor Systems, New York: ACM Press, 2017Conference paper, Published paper (Refereed)
    Abstract [en]

    There is the long-standing assumption that radio communication in the range of hundreds of meters needs to consume mWs of power at the transmitting device. In this paper, we demonstrate that this is not necessarily the case for some devices equipped with backscatter radios. We present LoRea an architecture consisting of a tag, a reader and multiple carrier generators that overcomes the power, cost and range limitations of existing systems such as Computational Radio Frequency Identification (CRFID). LoRea achieves this by: First, generating narrow-band backscatter transmissions that improve receiver sensitivity. Second, mitigating self-interference without the complex designs employed on RFID readers by keeping carrier signal and backscattered signal apart in frequency. Finally, decoupling carrier generation from the reader and using devices such as WiFi routers and sensor nodes as a source of the carrier signal. An off-the-shelf implementation of LoRea costs 70 USD, a drastic reduction in price considering commercial RFID readers cost 2000 USD. LoRea's range scales with the carrier strength, and proximity to the carrier source and achieves a maximum range of 3.4 km when the tag is located at 1m distance from a 28 dBm carrier source while consuming 70 mu W at the tag. When the tag is equidistant from the carrier source and the receiver, we can communicate upto 75m, a significant improvement over existing RFID readers.

    Place, publisher, year, edition, pages
    New York: ACM Press, 2017
    National Category
    Computer Engineering
    Identifiers
    urn:nbn:se:uu:diva-335566 (URN)10.1145/3131672.3131691 (DOI)000462783500018 ()978-1-4503-5459-2 (ISBN)
    Conference
    SenSys 2017, November 5–8, Delft, The Netherlands
    Available from: 2017-11-05 Created: 2017-12-06 Last updated: 2019-08-28Bibliographically approved
    4. Towards wide-area backscatter networks
    Open this publication in new window or tab >>Towards wide-area backscatter networks
    2017 (English)In: Proc. 4th ACM Workshop on Hot Topics in Wireless, New York: ACM Press, 2017, p. 49-53Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    New York: ACM Press, 2017
    National Category
    Computer Engineering
    Identifiers
    urn:nbn:se:uu:diva-335565 (URN)10.1145/3127882.3127888 (DOI)978-1-4503-5140-9 (ISBN)
    Conference
    HotWireless 2017, October 16, Snowbird, UT
    Available from: 2017-10-16 Created: 2017-12-06 Last updated: 2018-03-16Bibliographically approved
    5. Battery-free Visible Light Sensing
    Open this publication in new window or tab >>Battery-free Visible Light Sensing
    2017 (English)In: Proc. 4th ACM Workshop on Visible Light Communication Systems, New York: ACM Press, 2017, p. 3-8Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    New York: ACM Press, 2017
    National Category
    Computer Engineering
    Identifiers
    urn:nbn:se:uu:diva-335564 (URN)10.1145/3129881.3129890 (DOI)978-1-4503-5142-3 (ISBN)
    Conference
    VLCS 2017, October 16, Snowbird, UT
    Available from: 2017-10-16 Created: 2017-12-06 Last updated: 2018-03-16Bibliographically approved
    6. Visible Light Communication for Wearable Computing
    Open this publication in new window or tab >>Visible Light Communication for Wearable Computing
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    Visible Light Communication (VLC) is emerging as a means to network computing devices that ameliorates many hurdles of radio-frequency (RF) communications, for example, the limited available spectrum. Enabling VLC in wearable computing, however, is challenging because mobility induces unpredictable drastic changes in light conditions, for example, due to reflective surfaces and obstacles casting shadows.We experimentally demonstrate that such changes are so extreme that no single design of a VLC receiver can provide efficient performance across the board. The diversity found in current wearable devices complicates matters. Based on these observations, we present three different designs of VLC receivers that i) are individual orders of magnitude more efficient than the state-of-the-art in a subset of the possible conditions, and ii) can be combined in a single unit that dynamically switches to the best performing receiver based on the light conditions.Our evaluation indicates that dynamic switching incurs minimal overhead, that we can obtain throughput in the order of MBit/s, and at energy costs lower than many RF devices.

    National Category
    Communication Systems
    Research subject
    Computer Science with specialization in Computer Communication
    Identifiers
    urn:nbn:se:uu:diva-346820 (URN)
    Available from: 2018-03-21 Created: 2018-03-21 Last updated: 2018-03-23
  • 242.
    Varshney, Ambuj
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Mottola, Luca
    SICS Swedish ICT, Sweden and Politecnico di Milano, Italy.
    Carlsson, Mats
    SICS Swedish ICT, Sweden.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Directional Transmissions and Receptions for High-throughput Bulk Forwarding in Wireless Sensor Networks2015In: Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, 2015, p. 351-364Conference paper (Refereed)
    Abstract [en]

    We present DPT: a wireless sensor network protocol for bulk traffic that uniquely leverages electronically switchable directional (ESD) antennas. Bulk traffic is found in several scenarios and supporting protocols based on standard antenna technology abound. ESD antennas may improve performance in these scenarios; for example, by reducing channel contention as the antenna can steer the radiated energy only towards the intended receivers, and by extending the communication range at no additional energy cost. The corresponding protocol support, however, is largely missing. DPT addresses precisely this issue. First, while the network is quiescent, we collect link metrics across all possible antenna configurations. We use this information to formulate a constraint satisfaction problem (CSP) that allows us to find two multi-hop disjoint paths connecting source and sink, along with the corresponding antenna configurations. Domain-specific heuristics we conceive ameliorate the processing demands in solving the CSP, improving scalability. Second, the routing configuration we obtain is injected back into the network. During the actual bulk transfer, the source funnels data through the two paths by quickly alternating between them. Packet forwarding occurs deterministically at every hop. This allows the source to implicitly "clock" the entire pipeline, sparing the need of proactively synchronizing the transmissions across the two paths. Our results, obtained in a real testbed using 802.15.4-compliant radios and custom ESD antennas we built, indicate that DPT approaches the maximum throughput supported by the link layer, peaking at 214 kbit/s in the settings we test.

  • 243.
    Varshney, Ambuj
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Mottola, Luca
    SICS Swedish ICT, Sweden and Politecnico di Milano, Italy.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Coordination of Wireless Sensor Networks using Visible Light2015In: SenSys '15 Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, 2015, p. 421-422Conference paper (Refereed)
    Abstract [en]

    Wireless sensor networks are often deployed indoors where artificial lighting is present. Indoor lighting is increasingly being composed of Light Emitting Diodes (LEDs) that offer the ability to precisely control the intensity and the frequency of the light carrier. This can be used to coordinate wireless sensor networks (WSN). The periodic variations in the light intensity can synchronise the clocks on the sensor nodes, while the ability to modulate the light carrier enables the transmission of control information like channel assignment or transmission schedules.We present Guidelight, a simple mechanism that uses controlled fluctuations in the light intensity to coordinate sensor nodes. Guidelight can wake-up or time synchronise sensor nodes or even send small bits of control information to them. All of these have separate dedicated solutions in WSN. Guidelight aims to provide a single solution to all these problems. Our initial experiments demonstrate the ability of Guidelight to trigger sensor nodes. We demonstrate Guidelight is able to trigger sensor nodes selectively at a mean error of 21 μ s.

  • 244.
    Voigt, Thiemo
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hewage, Kasun
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Alm, Per
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Logopedi.
    Smartphone Support for Persons Who Stutter2014In: Proc. 13th International Symposium on Information Processing in Sensor Networks, Piscataway, NJ: IEEE Press, 2014, p. 293-294Conference paper (Refereed)
    Abstract [en]

    Stuttering is a very complex speech disorder that affects around 0.7% of adults while around 5% of the population have stuttered at some point. A large percentage of the affected people tend to speak more fluently when their own speech is played back to their ear with some type of alteration. While this has been done with special devices, smartphones can be used for this purpose. We report on our initial experiences on building such an application and demonstrate problems with delay caused by the lack of real-time support for audio playback in the Android operating system. We also discuss ideas for future work to improve app support for people who stutter.

  • 245.
    Wajima, Koji
    et al.
    Tokyo University of Science.
    Hayashi, Masaki
    Gotland University, School of Game Design, Technology and Learning Processes.
    Hurukawa, Toshihiro
    Tokyo University of Science.
    On reusable CG object for MMD in T2V2013Conference paper (Other academic)
    Abstract [en]

    Production of Computer Graphics (CG) by individuals has become active. CG movie production free tool called MMD (MikuMikuDance) is famous for its volunteers to produce CG character animations. Videos that have been produced in MMD are very popular among the fans of the character. Many people want to create their own animations, however, the production tool requires highly technical skill, and therefore, there is a big barrier for such individual people to create animations with the MMD. On the other hand, T2V (Text-To-Vision) is proposed to provide the easy-to-use production tool of character animation for the people who do not have much skill of making animations. Considering reuse of CG object of MMD in T2V to allow users to make animation in the easy way, we can expect the further development of content sharing sites. In this paper, we compare the CG formats of each of the MMD and T2V then we have had a test rendering to validate our method. We then clarify the issue of CG object reuse of MMD and T2V for the future applications.

  • 246.
    Wang, Xiaojie
    et al.
    NeuSoft Corp, Shenyang, Liaoning, Peoples R China; Dalian Univ Technol, Sch Software, Dalian, Peoples R China.
    Ning, Zhaolong
    Dalian Univ Technol, Sch Software, Dalian, Peoples R China; Kyushu Univ, Fukuoka, Fukuoka, Japan.
    Hu, Xiping
    Chinese Acad Sci, Shenzhen Inst Adv Technol, Beijing, Peoples R China.
    Ngai, Edith
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Imperial Coll London, London, England.
    Wang, Lei
    Dalian Univ Technol, Sch Software, Dalian, Peoples R China; Bell Labs Res China, Shanghai, Peoples R China; Samsung, Seoul, South Korea; Washington State Univ, Vancouver, WA USA.
    Hu, Bin
    Lanzhou Univ, Sch Informat Sci & Engn, Lanzhou, Gansu, Peoples R China; Tsinghua Univ, Beijing, Peoples R China; Swiss Fed Inst Technol, Zurich, Switzerland; ACM China, Beijing, Peoples R China.
    Kwok, Ricky
    Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China; HKIE, Hong Kong, Hong Kong, Peoples R China; IET, Hong Kong, Hong Kong, Peoples R China.
    A City-Wide Real-Time Traffic Management System: Enabling Crowdsensing in Social Internet of Vehicles2018In: IEEE Communications Magazine, ISSN 0163-6804, E-ISSN 1558-1896, Vol. 56, no 9, p. 19-25Article in journal (Refereed)
    Abstract [en]

    As an emerging platform based on ITS, SIoV is promising for applications of traffic management and road safety in smart cities. However, the end-to-end delay is large in store-carry-and-forward-based vehicular networks, which has become the main obstacle for the implementation of large-scale SIoV. With the extensive applications of mobile devices, crowdsensing is promising to enable real-time content dissemination in a city-wide traffic management system. This article first provides an overview of several promising research areas for traffic management in SIoV. Given the significance of traffic management in urban areas, we investigate a crowdsensing-based framework to provide timely response for traffic management in heterogeneous SIoV. The participant vehicles based on D2D communications integrate trajectory and topology information to dynamically regulate their social behaviors according to network conditions. A real-world taxi trajectory analysis-based performance evaluation is provided to demonstrate the effectiveness of the designed framework. Furthermore, we discuss several future research challenges before concluding our work.

  • 247.
    Wang, Yang
    et al.
    Northeastern Univ, Shenyang, Liaoning, Peoples R China..
    Guan, Nan
    Hong Kong Polytech Univ, Hong Kong, Hong Kong, Peoples R China..
    Sun, Jinghao
    Northeastern Univ, Shenyang, Liaoning, Peoples R China..
    Lv, Mingsong
    Northeastern Univ, Shenyang, Liaoning, Peoples R China..
    He, Qingqiang
    Northeastern Univ, Shenyang, Liaoning, Peoples R China..
    He, Tianzhang
    Northeastern Univ, Shenyang, Liaoning, Peoples R China..
    Wang, Yi
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Northeastern Univ, Shenyang, Liaoning, Peoples R China..
    Benchmarking OpenMP Programs for Real-Time Scheduling2017In: 2017 IEEE 23Rd International Conference On Embedded And Real-Time Computing Systems And Applications (RTSCA), IEEE Computer Society, 2017Conference paper (Refereed)
    Abstract [en]

    Real-time systems are shifting from single-core to multi-core processors. Software must be parallelized to fully utilize the computation power of multi-core architecture. OpenMP is a popular parallel programming framework in general and high-performance computing, and recently has drawn a lot of interests in embedded and real-time computing. Much recent work has been done on real-time scheduling of OpenMP-based parallel workload. However, these studies conduct evaluations with randomly generated task systems, which cannot well represent the structure features of OpenMP workload. This paper presents a benchmark suite, ompTGB, to support research on real-time scheduling of OpenMP-based parallel tasks. ompTGB does not only collect realistic OpenMP programs, but also models them into task graphs so that the real-time scheduling researchers can easily understand and use them. We also present a new response time bound for a subset of OpenMP programs and use it to demonstrate the usage of ompTGB.

  • 248.
    Wang, Yi
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Towards Customizable CPS: Composability, Efficiency and Predictability2017In: Formal Methods and Software Engineering / [ed] Duan, Z Ong, L, Springer, 2017, p. 3-15Conference paper (Refereed)
    Abstract [en]

    Today, many industrial products are defined by software, and therefore customizable by installing new applications on demand - their functionalities are implemented by software and can be modified and extended by software updates. This trend towards customizable products is extending into all domains of IT, including Cyber-Physical Systems (CPS) such as cars, robotics, and medical devices. However, these systems are often highly safety-critical. The current state-of-practice allows hardly any modifications once safety-critical systems are put in operation. This is due to the lack of techniques to preserve crucial safety conditions for the modified system, which severely restricts the benefits of software. This work aims at new paradigms and technologies for the design and safe software updates of CPS at operation-time - subject to stringent timing constraints, dynamic workloads, and limited resources on complex computing platforms. Essentially there are three key challenges: Composability, Resource-Efficiency and Predictability to enable modular, incremental and safe software updates over system life-time in use. We present research directions to address these challenges: (1) Open architectures and implementation schemes for building composable systems, (2) Fundamental issues in real-time scheduling aiming at a theory of multi-resource (inc. multiprocessor) scheduling, and (3) New-generation techniques and tools for fully separated verification of timing and functional properties of real-time systems with significantly improved efficiency and scalability. The tools shall support not only verification, but also code generation tailored for both co-simulation (interfaced) with existing design tools such as Open Modelica (for modeling and simulation of physical components), and deployment on given computing platforms.

  • 249.
    Wei, Bo
    et al.
    University of New South Wales, Sydney, Australia and CSIRO, Brisbane, Australia and SICS, Stockholm, Sweden.
    Varshney, Ambuj
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Patwari, Neal
    University of Utah, Salt Lake City and Xandem Technology, Salt Lake City.
    Hu, Wen
    University of New South Wales, Sydney, Australia and CSIRO, Brisbane, Australia and SICS, Stockholm, Sweden.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Chou, Chun Tung
    University of New South Wales, Sydney, Australia.
    dRTI: Directional Radio Tomographic Imaging2015In: Proceedings of the 14th International Conference on Information Processing in Sensor Networks, 2015, p. 166-177Conference paper (Refereed)
    Abstract [en]

    Radio tomographic imaging (RTI) enables device free localisation of people and objects in many challenging environments and situations. Its basic principle is to detect the changes in the statistics of radio signals due to the radio link obstruction by people or objects. However, the localisation accuracy of RTI suffers from complicated multipath propagation behaviours in radio links. We propose to use inexpensive and energy efficient electronically switched directional (ESD) antennas to improve the quality of radio link behaviour observations, and therefore, the localisation accuracy of RTI. We implement a directional RTI (dRTI) system to understand how directional antennas can be used to improve RTI localisation accuracy. We also study the impact of the choice of antenna directions on the localisation accuracy of dRTI and propose methods to effectively choose informative antenna directions to improve localisation accuracy while reducing overhead. Furthermore, we analyse radio link obstruction performance in both theory and simulation, as well as false positives and false negatives of the obstruction measurements to show the superiority of the directional communication for RTI. We evaluate the performance of dRTI in diverse indoor environments and show that dRTI significantly outperforms the existing RTI localisation methods based on omni-directional antennas.

  • 250.
    Yang, Yu
    et al.
    Royal Inst Technol KTH, Stockholm, Sweden.
    Stathis, Dimitrios
    Royal Inst Technol KTH, Stockholm, Sweden.
    Sharma, Prashant
    Royal Inst Technol KTH, Stockholm, Sweden.
    Paul, Kolin
    Indian Inst Technol Delhi, Delhi, India.
    Hemani, Ahmed
    Royal Inst Technol KTH, Stockholm, Sweden.
    Grabherr, Manfred
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Molecular Evolution.
    Ahmad, Rafi
    Inland Norway Univ Appl Sci, Hamar, Norway.
    RiBoSOM: Rapid Bacterial Genome Identification Using Self-Organizing Map implemented on the Synchoros SiLago Platform2018In: 2018 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XVIII), Association for Computing Machinery (ACM), 2018, p. 105-114Conference paper (Refereed)
    Abstract [en]

    Artificial Neural Networks have been applied to many traditional machine learning applications in image and speech processing. More recently, ANNs have caught attention of the bioinformatics community for their ability to not only speed up by not having to assemble genomes but also work with imperfect data set with duplications. ANNs for bioinformatics also have the added attraction of better scaling for massive parallelism compared to traditional bioinformatics algorithms. In this paper, we have adapted Self-organizing Maps for rapid identification of bacterial genomes called BioSOM. BioSOM has been implemented on a design of two coarse grain reconfigurable fabrics customized for dense linear algebra and streaming scratchpad memory respectively. These fabrics are implemented in a novel synchoros VLSI design style that enables composition by abutment. The synchoricity empowers rapid and accurate synthesis from Matlab models to create near ASIC like efficient solution. This platform, called SiLago (Silicon Lego) is benchmarked against a GPU implementation. The SiLago mentation of BioSOMs in four different dimensions, 128, 256, 512 and 1024 Neurons, were trained for two E Coli strains of bacteria with 40K training vectors. The results of SiLago implementation were benchmarked against a GPU GTX 1070 implementation in the CUDA framework. The comparison reveals 4 to 140x speed up and 4 to 5 orders of improvement in energy-delay product compared to implementation on GPU. This extreme efficiency comes with the added benefit of automated generation of GDSII level design from Matlab by using the Synchoros VLSI design style.

23456 201 - 250 of 252
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf