uu.seUppsala University Publications
Change search
Refine search result
123456 151 - 200 of 252
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 151.
    Helmisaari, Marc
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Det beroendeframkallande klicket: Engagerande och emotionella icke-spel2015Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    A new game genre has seen increased popularity during the last five years. A game genre that falls outside the classic definition of games. A game genre with the name of “Idle Games”. This study is all about which elements in these games create the drive for the player to continue playing and how the elements can be analyzed with the help of MDA and AARRR frameworks. Data have been collected from three popular idle games with the names of Cookie clicker, Clicker Heros and AdVenture Capatalist. A survey has also been sent to the players of these games in order to get better knowledge of why these games are popular. The result has then been analyzed with different design theories in order to examine which game mechanics create the feel to play and why.

  • 152.
    Henriksson, Michael
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    A Cognitive Work Analysis as Basis for Development of a Compact C2 System to Support Air Surveillance Work2012Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    This Master of Science thesis is producedat SAAB Security and Defence Solutions.The purpose of the thesis is to analyzehow air surveillance work can be carriedout. This information is then used to givesuggestions for the design of a new systemcontaining only the most essentialfunctionality. This is done by examiningthe available frameworks which can informinterface design and applying a frameworkto analyze work in a complete system usedas the basis of the new Compact C2 system.The second part of the analysis isdirected towards the stripped system(Compact C2) and both parts of theanalysis are used to inform interfacedesign of the Compact C2 system. By usingthe full range of the chosen framework foranalysis of the identification process inSwedish air surveillance work, someessential functions were identified andshould also have support in a Compact C2 system.

  • 153.
    Hermans, Frederik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Rensfelt, Olof
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Ngai, Edith
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Nordén, Lars-Åke
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Gunningberg, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    SoNIC: Classifying interference in 802.15.4 sensor networks2013In: Proc. 12th International Conference on Information Processing in Sensor Networks, New York: ACM Press, 2013, p. 55-66Conference paper (Refereed)
  • 154.
    Hnich, Brahim
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Information Science.
    Kiziltan, Zeynep
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Information Science.
    Walsh, Toby
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Information Science.
    Modelling a Balanced Academic Curriculum Problem2002In: Proceedings CPAIOR 2002, 2002Conference paper (Refereed)
  • 155.
    Hojjat, Hossein
    et al.
    Cornell Univ, Ithaca, NY 14853 USA..
    Rümmer, Philipp
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    McClurg, Jedidiah
    CU Boulder, Boulder, CO USA..
    Cerny, Pavol
    CU Boulder, Boulder, CO USA..
    Foster, Nate
    Cornell Univ, Ithaca, NY 14853 USA..
    Optimizing Horn Solvers for Network Repair2016In: Proceedings of the 2016 16Th Conference on Formal Methods In Computer-Aided Design (FMCAD 2016) / [ed] Piskac, R Talupur, M, IEEE , 2016, p. 73-80Conference paper (Refereed)
    Abstract [en]

    Automatic program repair modifies a faulty program to make it correct with respect to a specification. Previous approaches have typically been restricted to specific programming languages and a fixed set of syntactical mutation techniques-e.g., changing the conditions of if statements. We present a more general technique based on repairing sets of unsolvable Horn clauses. Working with Horn clauses enables repairing programs from many different source languages, but also introduces challenges, such as navigating the large space of possible repairs. We propose a conservative semantic repair technique that only removes incorrect behaviors and does not introduce new behaviors. Our proposed framework allows the user to request the best repairs-it constructs an optimization lattice representing the space of possible repairs, and uses a novel local search technique that exploits heuristics to avoid searching through sub-lattices with no feasible repairs. To illustrate the applicability of our approach, we apply it to problems in software-defined networking (SDN), and illustrate how it is able to help network operators fix buggy configurations by properly filtering undesired traffic. We show that interval and Boolean lattices are effective choices of optimization lattices in this domain, and we enable optimization objectives such as modifying the minimal number of switches. We have implemented a prototype repair tool, and present preliminary experimental results on several benchmarks using real topologies and realistic repair scenarios in data centers and congested networks.

  • 156.
    Homewood, Thomas
    et al.
    Swedish Institute of Computer Science.
    Norström, Christer
    Swedish Institute of Computer Science.
    Gunningberg, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Skitracker: Measuring skiing performance using a body-area network2013In: Proc. 12th International Conference on Information Processing in Sensor Networks, New York: ACM Press, 2013, p. 319-320Conference paper (Refereed)
  • 157. Hossain, Adnan
    Synliggörande av provfordonets elsystemstatus2016Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [sv]

    Projektets mål är att utveckla ett eller flera program för att möta de behov systemutvecklarna har. I dagsläget är det väldigt problematiskt att bestämma en lastbils konfiguration sett på elsystem. Det krävs en stor teknisk bakgrund samt att det är väldigt tidskrävande. Detta i sin tur leder till att processen för systemtester dröjer och allt testandet av lastbilar tar längre tid. Lastbilarna som är prototyper byggs efter framtidens behov och kräver kontinuerligt testande. Därav är önskemålen stor kring utvecklandet av ett program som minimerar tidsförlusten och som inte kräver en så stor teknisk bakgrund. Utvecklandet skedde i Microsoft Visual Studio som är ett utvecklarprogram där programmeringsspråket är C\# och asp .net. För hanteringen av data gällande bilarna så användes databaser där dataflödet styrdes med hjälp av frågespråket SQL. Resultatet av projektet vart en webbapplikation och en uppgradering av ett befintligt program. I detta skede är webbapplikationen och det tillhörande programmet ute på testning bland Scania anställda och framtida uppdateringar och justeringar är inplanerade.

  • 158.
    Jacobsson, Martin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Orfanidis, Charalampos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Using software-defined networking principles for wireless sensor networks2015In: Proc. 11th Swedish National Computer Networking Workshop, 2015Conference paper (Refereed)
  • 159.
    Johansson, Magnus
    et al.
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Verhagen, Harko
    Stockholms universitet, Institutionen för data- och systemvetenskap.
    Massively multiple online role playing games as normative multiagent systems2009In: Normative Multi-Agent Systems, Guido Boella, Pablo Noriega, Gabriella Pigozzi, and Harko Verhagen , 2009Conference paper (Other academic)
  • 160.
    Jonsson, Kristoffer
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences.
    Lundberg, David
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences.
    Digital Interface for Intelligent Sensors2013Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    Digital Interface for Intelligent Sensors was a project whose goal was to create a digital network interface that enabled easy distribution of data from different types of digital sensors to a central computer. The purpose was to replace the already existing analogue data collection system, in order to benefit from the advantages of digital communication. This demanded a software protocol that satisfyingly would be implementable on a microcontroller. Along with software implementation the specific objective was to design, construct and build an intelligent hardware sensor device. This device was supposed to measure temperature, humidity, wind direction and wind speed by collecting information from adequate digital transducers.

    The project involved researches about bus-protocols as well as practically design and build circuits. A lot of software programming was made during the project, to get the device to work as expected. During research the Modbus-protocol was found to be the best option for our specific software needs. As for the hardware part, the core of the sensor device was based on an ATmega328 microcontroller. The ATmega328 proved to be a suitable hardware platform for implementing both the Modbus-protocol and the necessary code required to extract information from the transducers. By linking a computer to the system, working as a master, weather data from the device were able to be logged.

    The device was successfully installed on the roof at Ångströmslaboratoriet, house 2. The complete system enables other digital, Modbus implemented, devices to connect in order to communicate with the central computer. Having many devices can lead to rather complex systems. The system created in this project keeps track on all the installed devices using addresses, making a complex system easy to manage.

    The project also involved a brief collaboration with another group constructing a different digital measuring device. This device was able to connect to the system using the same Modbus-protocol and thereby communicating with the central computer.

  • 161.
    Jouet, Antoine
    et al.
    University of Angers.
    Gac, Pierre
    University of Angers.
    Hayashi, Masaki
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Bachelder, Steven
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Nakajima, Masayuki
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    When virtual reality meet television: Use of a motor Text-To-Vision adapted for the television2015In: Proceedings of Art and Science Forum 2015, 2015Conference paper (Other academic)
    Abstract [en]

    This paper explains and details an automated TV News Show program, using the Text-To-Vision (T2V) technology. Today, 3D CG environments are more and more often used, even in the classic media like TV. However, there is not any fully virtual TV News Show coming yet, staring only virtual characters with being completely automated, using news source available on the Internet. We made it possible to create this kind of automatic news show system owing to the T2V, with interactive avatars, facial expressions and multiple modular and dynamic scenes.

  • 162.
    Kameoka, Masahiro
    et al.
    Tokyo University of Science.
    Furukawa, Toshihiro
    Tokyo University of Science.
    Hayashi, Masaki
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    発見的手法によるWebニュースからのクイズ自動生成の試み2015In: Proceedings of Art and Science Forum 2015, 2015Conference paper (Other academic)
    Abstract [en]

    In this study, as one of the content production technology aiming at re-use of Web content, we attemp the automatic generation of quiz content from Web News Source. Whereas the related researches mainly use analytical methods, we use a heuristic method to simulate the way of thinking  a person makes a quiz from an original sentence stating the knowledge of the quiz. We examined the automatic quiz generation by our heuristic method and discussed the result.

  • 163.
    Kameoka, Masahiro
    et al.
    Tokyo University of Science.
    Hayashi, Masaki
    Gotland University, School of Game Design, Technology and Learning Processes.
    Furukawa, Toshihiro
    Tokyo University of Science.
    Improvement of Automatic BBS Visualization in T2V: Animation considering dialogue structure2013Conference paper (Other academic)
    Abstract [en]

    T2V(Text-to-Vision) a technology which is capable of automatic animated movie generation to assist individuals who do not have special knowledge about animation production. This paper shows improvement of the function that animates BBS (Bulletin Board System) with this technology. T2V has a package (2ch convertor) that is capable of animating “2channel” (largest BBS in Japan). The present 2ch convertor, however, does not support dialogue situation based on quotation marks. Therefore, it causes a problem that it cannot produce animation with dialogue. In this paper we propose a method of animation production regarding the conversation structure in BBS to create more natural expression in the animation.

  • 164.
    Kaxiras, Stefanos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Ros, Alberto
    A New Perspective for Efficient Virtual-Cache Coherence2013In: Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013, p. 535-546Conference paper (Refereed)
    Abstract [en]

    Coherent shared virtual memory (cSVM) is highly coveted for heterogeneous architectures as it will simplify program- ming across different cores and manycore accelerators. In this context, virtual L1 caches can be used to great advan- tage, e.g., saving energy consumption by eliminating address translation for hits. Unfortunately, multicore virtual-cache coherence is complex and costly because it requires reverse translation for any coherence request directed towards a vir- tual L1. The reason is the ambiguity of the virtual address due to the possibility of synonyms. In this paper, we take a radically different approach than all prior work which is focused on reverse translation. We examine the problem from the perspective of the coherence protocol. We show that if a coherence protocol adheres to certain conditions, it operates effortlessly with virtual caches, without requir- ing reverse translations even in the presence of synonyms. We show that these conditions hold in a new class of simple and efficient request-response protocols that use both self- invalidation and self-downgrade.This results in a new solu- tion for virtual-cache coherence, significantly less complex and more efficient than prior proposals. We study design choices for TLB placement under our proposal and compare them against those under a directory-MESI protocol. Our approach allows for choices that are particularly effective as for example combining all per-core TLBs in a single logical TLB in front of the last level cache. Significant area, energy, and performance benefits ensue as a result of simplifying the entire multicore memory organization. 

  • 165.
    Khan, Muneeb
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Sembrant, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Low Overhead Instruction-Cache Modeling Using Instruction Reuse Profiles2012In: International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'12), IEEE Computer Society , 2012, p. 260-269Conference paper (Refereed)
    Abstract [en]

    Performance loss caused by L1 instruction cache misses varies between different architectures and cache sizes. For processors employing power-efficient in-order execution with small caches, performance can be significantly affected by instruction cache misses. The growing use of low-power multi-threaded CPUs (with shared L1 caches) in general purpose computing platforms requires new efficient techniques for analyzing application instruction cache usage. Such insight can be achieved using traditional simulation technologies modeling several cache sizes, but the overhead of simulators may be prohibitive for practical optimization usage. In this paper we present a statistical method to quickly model application instruction cache performance. Most importantly we propose a very low-overhead sampling mechanism to collect runtime data from the application's instruction stream. This data is fed to the statistical model which accurately estimates the instruction cache miss ratio for the sampled execution. Our sampling method is about 10x faster than previously suggested sampling approaches, with average runtime overhead as low as 25% over native execution. The architecturally-independent data collected is used to accurately model miss ratio for several cache sizes simultaneously, with average absolute error of 0.2%. Finally, we show how our tool can be used to identify program phases with large instruction cache footprint. Such phases can then be targeted to optimize for reduced code footprint.

  • 166.
    Koukos, Konstantinos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Efficient Execution Paradigms for Parallel Heterogeneous Architectures2016Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    This thesis proposes novel, efficient execution-paradigms for parallel heterogeneous architectures. The end of Dennard scaling is threatening the effectiveness of DVFS in future nodes; therefore, new execution paradigms are required to exploit the non-linear relationship between performance and energy efficiency of memory-bound application-regions. To attack this problem, we propose the decoupled access-execute (DAE) paradigm. DAE transforms regions of interest (at program-level) in two coarse-grain phases: the access-phase and the execute-phase, which we can independently DVFS. The access-phase is intended to prefetch the data in the cache, and is therefore expected to be predominantly memory-bound, while the execute-phase runs immediately after the access-phase (that has warmed-up the cache) and is therefore expected to be compute-bound.

    DAE, achieves good energy savings (on average 25% lower EDP) without performance degradation, as opposed to other DVFS techniques. Furthermore, DAE increases the memory level parallelism (MLP) of memory-bound regions, which results in performance improvements of memory-bound applications. To automatically transform application-regions to DAE, we propose compiler techniques to automatically generate and incorporate the access-phase(s) in the application. Our work targets affine, non-affine, and even complex, general-purpose codes. Furthermore, we explore the benefits of software multi-versioning to optimize DAE in dynamic environments, and handle codes with statically unknown access-phase overheads. In general, applications automatically-transformed to DAE by our compiler, maintain (or even exceed in some cases) the good performance and energy efficiency of manually-optimized DAE codes.

    Finally, to ease the programming environment of heterogeneous systems (with integrated GPUs), we propose a novel system-architecture that provides unified virtual memory with low overhead. The underlying insight behind our work is that existing data-parallel programming models are a good fit for relaxed memory consistency models (e.g., the heterogeneous race-free model). This allows us to simplify the coherency protocol between the CPU – GPU, as well as the GPU memory management unit. On average, we achieve 45% speedup and 45% lower EDP over the corresponding SC implementation.

    List of papers
    1. Towards more efficient execution: a decoupled access-execute approach
    Open this publication in new window or tab >>Towards more efficient execution: a decoupled access-execute approach
    2013 (English)In: Proc. 27th ACM International Conference on Supercomputing, New York: ACM Press, 2013, p. 253-262Conference paper, Published paper (Refereed)
    Abstract [en]

    The end of Dennard scaling is expected to shrink the range of DVFS in future nodes, limiting the energy savings of this technique. This paper evaluates how much we can increase the effectiveness of DVFS by using a software decoupled access-execute approach. Decoupling the data access from execution allows us to apply optimal voltage-frequency selection for each phase and therefore improve energy efficiency over standard coupled execution.

    The underlying insight of our work is that by decoupling access and execute we can take advantage of the memory-bound nature of the access phase and the compute-bound nature of the execute phase to optimize power efficiency, while maintaining good performance. To demonstrate this we built a task based parallel execution infrastructure consisting of: (1) a runtime system to orchestrate the execution, (2) power models to predict optimal voltage-frequency selection at runtime, (3) a modeling infrastructure based on hardware measurements to simulate zero-latency, per-core DVFS, and (4) a hardware measurement infrastructure to verify our model's accuracy.

    Based on real hardware measurements we project that the combination of decoupled access-execute and DVFS has the potential to improve EDP by 25% without hurting performance. On memory-bound applications we significantly improve performance due to increased MLP in the access phase and ILP in the execute phase. Furthermore we demonstrate that our method can achieve high performance both in presence or absence of a hardware prefetcher.

    Place, publisher, year, edition, pages
    New York: ACM Press, 2013
    Keywords
    Task-Based Execution, Decoupled Execution, Performance, Energy, DVFS
    National Category
    Computer Systems
    Research subject
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-203239 (URN)10.1145/2464996.2465012 (DOI)978-1-4503-2130-3 (ISBN)
    Conference
    ICS 2013, June 10-14, Eugene, OR
    Projects
    LPGPU FP7-ICT-288653UPMARC
    Funder
    EU, FP7, Seventh Framework Programme, ICT-288653Swedish Research Council
    Available from: 2013-07-06 Created: 2013-07-05 Last updated: 2016-09-02Bibliographically approved
    2. Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling
    Open this publication in new window or tab >>Fix the code. Don't tweak the hardware: A new compiler approach to Voltage–Frequency scaling
    Show others...
    2014 (English)In: Proc. 12th International Symposium on Code Generation and Optimization, New York: ACM Press, 2014, p. 262-272Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    New York: ACM Press, 2014
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-212778 (URN)978-1-4503-2670-4 (ISBN)
    Conference
    CGO 2014, February 15-19, Orlando, FL
    Projects
    UPMARC
    Available from: 2014-02-19 Created: 2013-12-13 Last updated: 2018-01-11Bibliographically approved
    3. Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
    Open this publication in new window or tab >>Multiversioned decoupled access-execute: The key to energy-efficient compilation of general-purpose programs
    Show others...
    2016 (English)In: Proc. 25th International Conference on Compiler Construction, New York: ACM Press, 2016, p. 121-131Conference paper, Published paper (Refereed)
    Abstract [en]

    Computer architecture design faces an era of great challenges in an attempt to simultaneously improve performance and energy efficiency. Previous hardware techniques for energy management become severely limited, and thus, compilers play an essential role in matching the software to the more restricted hardware capabilities. One promising approach is software decoupled access-execute (DAE), in which the compiler transforms the code into coarse-grain phases that are well-matched to the Dynamic Voltage and Frequency Scaling (DVFS) capabilities of the hardware. While this method is proved efficient for statically analyzable codes, general purpose applications pose significant challenges due to pointer aliasing, complex control flow and unknown runtime events. We propose a universal compile-time method to decouple general-purpose applications, using simple but efficient heuristics. Our solutions overcome the challenges of complex code and show that automatic decoupled execution significantly reduces the energy expenditure of irregular or memory-bound applications and even yields slight performance boosts. Overall, our technique achieves over 20% on average energy-delay-product (EDP) improvements (energy over 15% and performance over 5%) across 14 bench-marks from SPEC CPU 2006 and Parboil benchmark suites, with peak EDP improvements surpassing 70%.

    Place, publisher, year, edition, pages
    New York: ACM Press, 2016
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-283200 (URN)10.1145/2892208.2892209 (DOI)000389808800012 ()9781450342414 (ISBN)
    Conference
    CC 2016, March 17–18, Barcelona, Spain
    Projects
    UPMARC
    Available from: 2016-03-17 Created: 2016-04-11 Last updated: 2018-12-03Bibliographically approved
    4. Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
    Open this publication in new window or tab >>Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead
    2016 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 13, no 1, article id 1Article in journal (Refereed) Published
    Abstract [en]

    This work proposes a novel scheme to facilitate heterogeneous systems with unified virtual memory. Research proposals implement coherence protocols for sequential consistency (SC) between central processing unit (CPU) cores and between devices. Such mechanisms introduce severe bottlenecks in the system; therefore, we adopt the heterogeneous-race-free (HRF) memory model. The use of HRF simplifies the coherency protocol and the graphics processing unit (GPU) memory management unit (MMU). Our protocol optimizes CPU and GPU demands separately, with the GPU part being simpler while the CPU is more elaborate and latency aware. We achieve an average 45% speedup and 45% energy-delay product reduction (20% energy) over the corresponding SC implementation.

    Keywords
    Multicore; heterogeneous coherence; GPU MMU design; virtual coherence protocol; directory-less protocol
    National Category
    Computer Systems
    Identifiers
    urn:nbn:se:uu:diva-295765 (URN)10.1145/2889488 (DOI)000373904600001 ()
    Projects
    UPMARC
    Funder
    EU, FP7, Seventh Framework Programme, FP7-ICT-288653EU, European Research Council, TIN2012-38341-C04-03
    Available from: 2016-04-05 Created: 2016-06-09 Last updated: 2017-11-30Bibliographically approved
  • 167.
    Koukos, Konstantinos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Towards more efficient execution: a decoupled access-execute approach2013In: Proc. 27th ACM International Conference on Supercomputing, New York: ACM Press, 2013, p. 253-262Conference paper (Refereed)
    Abstract [en]

    The end of Dennard scaling is expected to shrink the range of DVFS in future nodes, limiting the energy savings of this technique. This paper evaluates how much we can increase the effectiveness of DVFS by using a software decoupled access-execute approach. Decoupling the data access from execution allows us to apply optimal voltage-frequency selection for each phase and therefore improve energy efficiency over standard coupled execution.

    The underlying insight of our work is that by decoupling access and execute we can take advantage of the memory-bound nature of the access phase and the compute-bound nature of the execute phase to optimize power efficiency, while maintaining good performance. To demonstrate this we built a task based parallel execution infrastructure consisting of: (1) a runtime system to orchestrate the execution, (2) power models to predict optimal voltage-frequency selection at runtime, (3) a modeling infrastructure based on hardware measurements to simulate zero-latency, per-core DVFS, and (4) a hardware measurement infrastructure to verify our model's accuracy.

    Based on real hardware measurements we project that the combination of decoupled access-execute and DVFS has the potential to improve EDP by 25% without hurting performance. On memory-bound applications we significantly improve performance due to increased MLP in the access phase and ILP in the execute phase. Furthermore we demonstrate that our method can achieve high performance both in presence or absence of a hardware prefetcher.

  • 168.
    Koukos, Konstantinos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Black-Schaffer, David
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Spiliopoulos, Vasileios
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models2013In: PARMA 2013, 4th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2013Conference paper (Refereed)
    Abstract [en]

    This work demonstrates the potential of hardware and software optimization to improve theeffectiveness of dynamic voltage and frequency scaling (DVFS). For software, we decouple data prefetch (access) and computation (execute) to enable optimal DVFS selectionfor each phase. For hardware, we use measurements from state-of-the-art multicore processors to accurately model the potential of per-core, zero-latency DVFS. We demonstrate that the combinationof decoupled access-execute and precise DVFS has the potential to decrease EDP by 25-30% without reducing performance.

    The underlying insight in this work is that by decoupling access and execute we can take advantageof the memory-bound nature of the access phase and the compute-bound nature of the execute phase to optimize power efficiency. For the memory-bound access phase, where we prefetch data into the cachefrom main memory, we can run at a reduced frequency and voltage without hurting performance. Thereafter, the execute phase can run much faster, thanks to the prefetching of the access phase, and achieve higher performance. This decoupled program behavior allows us to achieve more effective use of DVFS than standard coupled executions which mix data access and compute.

    To understand the potential of this approach, we measure application performance and power consumption on a modern multicore system across a range of frequencies and voltages. From this data we build a model that allows us to analyze the effects of per-core, zero-latency DVFS. The results of this work demonstrate the significant potential for finer-grain DVFS in combination with DVFS-optimized software.

  • 169.
    Koukos, Konstantinos
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Ros, Alberto
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Kaxiras, Stefanos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
    Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead2016In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 13, no 1, article id 1Article in journal (Refereed)
    Abstract [en]

    This work proposes a novel scheme to facilitate heterogeneous systems with unified virtual memory. Research proposals implement coherence protocols for sequential consistency (SC) between central processing unit (CPU) cores and between devices. Such mechanisms introduce severe bottlenecks in the system; therefore, we adopt the heterogeneous-race-free (HRF) memory model. The use of HRF simplifies the coherency protocol and the graphics processing unit (GPU) memory management unit (MMU). Our protocol optimizes CPU and GPU demands separately, with the GPU part being simpler while the CPU is more elaborate and latency aware. We achieve an average 45% speedup and 45% energy-delay product reduction (20% energy) over the corresponding SC implementation.

  • 170.
    Kumar, Rakesh
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Univ Edinburgh, Edinburgh, Midlothian, Scotland.
    Grot, Boris
    Univ Edinburgh, Edinburgh, Midlothian, Scotland.
    Nagarajan, Vijay
    Univ Edinburgh, Edinburgh, Midlothian, Scotland.
    Blasting Through The Front-End Bottleneck With Shotgun2018In: ACM Sigplan Notices, 2018, Vol. 53, no 2, p. 30-42Conference paper (Refereed)
    Abstract [en]

    The front-end bottleneck is a well-established problem in server workloads owing to their deep software stacks and large instruction working sets. Despite years of research into effective L1-I and BTB prefetching, state-of-the-art techniques force a trade-off between performance and metadata storage costs. This work introduces Shotgun, a BTB-directed front-end prefetcher powered by a new BTB organization that maintains a logical map of an application's instruction footprint, which enables high-efficacy prefetching at low storage cost. To map active code regions, Shotgun precisely tracks an application's global control flow (e.g., function and trap routine entry points) and summarizes local control flow within each code region. Because the local control flow enjoys high spatial locality, with most functions comprised of a handful of instruction cache blocks, it lends itself to a compact region-based encoding. Meanwhile, the global control flow is naturally captured by the application's unconditional branch working set (calls, returns, traps). Based on these insights, Shotgun devotes the bulk of its BTB capacity to branches responsible for the global control flow and a spatial encoding of their target regions. By effectively capturing a map of the application's instruction footprint in the BTB, Shotgun enables highly effective BTB-directed prefetching. Using a storage budget equivalent to a conventional BTB, Shotgun outperforms the state-of-the-art BTB-directed front-end prefetcher by up to 14% on a set of varied commercial workloads.

  • 171.
    Kähkönen, Christian
    Gotland University, School of Game Design, Technology and Learning Processes.
    How to create a 3D character model for a pre-existing live action film, that matches the characteristics of the intellectual property and the visual style of the chosen film2012Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    My aim is to find out how to create a 3d character model for a pre-existing live action film, give this character characteristics that match the intellectual property and follow the visual style of the chosen film. For my example in this degree project, I chose Disney's adaption of John Carter of Mars.

    I used my own pipeline, which is a collection of work methods from different artists, for the creation of the example 3d character model. Though with a limit of bringing the model through the first two steps, as I focus on the constraint of this thesis work.

    In order to create this model, I researched the universe of John Carter, and the visual style of the film, and from that knowledge I designed a character to create a 3d model of.

    The finished 3d character model of this degree project was then compared to models from the production of John Carter of Mars, both by the author and through a survey to evaluate the result.

  • 172.
    Lampa, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles2016In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, article id 67Article in journal (Refereed)
    Abstract [en]

    Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.

  • 173.
    Lampka, Kai
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    With Real-time Performance Analysis and Monitoring to Timing Predictable Use of Multi-core Architectures2013In: Runtime Verification, Springer Berlin/Heidelberg, 2013, p. 400-402Conference paper (Refereed)
  • 174.
    Lampka, Kai
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Elektrobit Automot, Erlangen, Germany..
    Bondorf, Steffen
    Univ Kaiserslautern, Distributed Comp Syst DISCO Lab, Kaiserslautern, Germany..
    Schmitt, Jens B.
    Univ Kaiserslautern, Distributed Comp Syst DISCO Lab, Kaiserslautern, Germany..
    Guan, Nan
    Hong Kong Polytech Univ, Dept Comp, Hong Kong, Hong Kong, Peoples R China..
    Wang, Yi
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Generalized Finitary Real-Time Calculus2017In: IEEE INFOCOM 2017 - IEEE Conference on Computer Communications, IEEE, 2017Conference paper (Other academic)
    Abstract [en]

    Real-time Calculus (RTC) is a non-stochastic queuing theory to the worst-case performance analysis of distributed real-time systems. Workload as well as resources are modelled as piece-wise linear, pseudo-periodic curves and the system under investigation is modelled as a sequence of algebraic operations over these curves. The memory footprint of computed curves increases exponentially with the sequence of operations and RTC may become computationally infeasible fast. Recently, Finitary RTC has been proposed to counteract this problem. Finitary RTC restricts curves to finite input domains and thereby counteracts the memory demand explosion seen with pseudo periodic curves of common RTC implementations. However, the proof to the correctness of Finitary RTC specifically exploits the operational semantic of the greed processing component (GPC) model and is tied to the maximum busy window size. This is an inherent limitation, which prevents a straight-forward generalization. In this paper, we provide a generalized Finitary RTC that abstracts from the operational semantic of a specific component model and reduces the finite input domains of curves even further. The novel approach allows for faster computations and the extension of the Finitary RTC idea to a much wider range of RTC models.

  • 175.
    Lampka, Kai
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Siegle, Markus
    University of the Federal Forces Germany.
    A Symbolic Approach to the Analysis of Multi-Formalism Markov Reward Models2013In: Theory and Application of Multi-Formalism Modeling, Pennsylvania: IGI Global, 2013, 1Chapter in book (Refereed)
    Abstract [en]

    With complex systems and complex requirements being a challenge that designers must face to reach quality results, multi-formalism modeling offers tools and methods that allow modelers to exploit the benefits of different techniques in a general framework intended to address these challenges.

    Theory and Application of Multi-Formalism Modeling boldly explores the importance of this topic by gathering experiences, theories, applications, and solutions from diverse perspectives of those involved with multi-formalism modeling. Professionals, researchers, academics, and students in this field will be able to critically evaluate the latest developments and future directions of multi-formalism research.

  • 176.
    Lantz, Olof
    Uppsala University, Disciplinary Domain of Science and Technology, Physics, Department of Physics and Astronomy, Applied Nuclear Physics.
    Virtualiserad testmiljö: Utvärdering av virtualiseringsprogramvaror2014Independent thesis Basic level (professional degree), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    Virtualization has increasingly been adopted in the last decade and the usages of virtualized environments are going to be an important part of how computers are used in the nearby future. There are a lot of advantages with virtualization and different methods have been developed to make it as efficient as possible.

    Forsmarks Kraftgrupp were interested in the possibility of taking advantage of virtualization in their testing environment.

    In this report, hypervisors of type 1 and type 2 and containers have been evaluated to determine which method and what program is preferable on a server cluster of four HP ProLiant DL380 Generation 4. Because of the hardware specifications of the DL380, focus has been on virtualization programs that do not require hardware assisted virtualization.

    The results show that it is possible to use some of the type 2 hypervisors on the HP ProLiant DL380 Generation 4. The suggested virtualization programs are VMware Workstation or Oracle VirtualBox. 

  • 177.
    Lind, Simon
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Distributed Ensemble Learning With Apache Spark2016Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
  • 178.
    Lindén, Jonatan
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Jonsson, Bengt
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    A Skiplist-based Concurrent Priority Queue with Minimal Memory Contention2013In: OPODIS 2013: 17th International Conference On Principles Of DIstributed Systems / [ed] Roberto Baldoni, Nicolas Nisse, Maarten van Steen, Berlin: Springer Berlin/Heidelberg, 2013, p. 206-220Conference paper (Refereed)
    Abstract [en]

    Priority queues are fundamental to many multiprocessor  applications. Several priority queue algorithms based on skiplists  have been proposed, as skiplists allow concurrent accesses to  different parts of the data structure in a simple way. However, for  priority queues on multiprocessors, an inherent bottleneck is the  operation that deletes the minimal element. We present a  linearizable, lock-free, concurrent priority queue algorithm, based  on skiplists, which minimizes the contention for shared memory that  is caused by the DeleteMin operation. The main idea is to  minimize the number of global updates to shared memory that are  performed in one DeleteMin. In comparison with other  skiplist-based priority queue algorithms, our algorithm achieves a  30 - 80% improvement.

  • 179.
    Ljungberg, Jens
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Electricity.
    Evaluation of a Centralized Substation Protection and Control System for HV/MV Substation2018Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Today, conventional substation protection and control systems are of a widely distributed character. One substation can easily have as many as 50 data processing points that all perform similar algorithms on voltage and current data. There is also only limited communication between protection devices, and each device is only aware of the bay in which it is installed. With the intent of implementing a substation protection system that is simpler, more efficient and better suited for future challenges, Ellevio AB implemented a centralized system in a primary substation in 2015. It is comprised of five components that each handle one type of duty: Data processing, communication, voltage measurements, current measurements and breaker control. Since its implementation, the centralized system has been in parallel operation with the conventional, meaning that it performs station wide data acquisition, processing and communication, but is unable to trip the station breakers. The only active functionality of the centralized system is the voltage regulation. This work is an evaluation of the centralized system and studies its protection functionality, voltage regulation, fault response and output signal correlation with the conventional system. It was found that the centralized system required the implementation of a differential protection function and protection of the capacitor banks and busbar coupling to provide protection equivalent to that of the conventional system. The voltage regulation showed unsatisfactory long regulation time lengths, which could have been a result of low time resolution. The fault response and signal correlation were deemed satisfactory.

  • 180.
    Lundstedt, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Faculty of Science and Technology.
    Implementation and Evaluation of Image Retrieval Method Utilizing Geographic Location Metadata2009Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Multimedia retrieval systems are very important today with millions of content creators all over the world generating huge multimedia archives. Recent developments allows for content based image and video retrieval. These methods are often quite slow, especially if applied on a library of millions of media items.

    In this research a novel image retrieval method is proposed, which utilizes spatial metadata on images. By finding clusters of images based on their geographic location, the spatial metadata, and combining this information with existing content- based image retrieval algorithms, the proposed method enables efficient presentation of high quality image retrieval results to system users.

    Clustering methods considered include Vector Quantization, Vector Quantization LBG and DBSCAN. Clustering was performed on three different similarity measures; spatial metadata, histogram similarity or texture similarity.

    For histogram similarity there are many different distance metrics to use when comparing histograms. Euclidean, Quadratic Form and Earth Mover’s Distance was studied. As well as three different color spaces; RGB, HSV and CIE Lab. 

  • 181.
    Löscher, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
    Tsiftes, Nicolas
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Handziski, Vlado
    Efficient and Flexible Sensornet Checkpointing2014In: Wireless Sensor Networks, volume 8354, 2014, p. -65Conference paper (Refereed)
    Abstract [en]

    Developing sensornet software is difficult partly because ofthe limited visibility of the system state of deployed nodes. Sensor-net checkpointing is a method that allows developers to save and restore full system state of nodes. We present four extensions to sensornetcheckpointing—compression, binary diffs, selective checkpointing, and checkpoint inspection—that reduce the time required for checkpointing operations considerably, and improve the granularity at which system state can be examined and manipulated down to the variable level. We show through an experimental evaluation that the checkpoint sizes can be reduced by 70%-93%, and the time can be reduced by at least 50% because of these improvements. The reduced time and increased granularity benefits multiple checkpointing use cases, including automated testing, network visualization, and software debugging.

  • 182.
    Mann, I. R.
    et al.
    Univ Alberta, Dept Phys, Edmonton, AB, Canada.
    Di Pippo, S.
    United Nations Off Vienna, Off Outer Space Affairs, Vienna, Austria.
    Opgenoorth, Hermann Josef
    Uppsala University, Disciplinary Domain of Science and Technology, Physics, Swedish Institute of Space Physics, Uppsala Division. Univ Leicester, Dept Phys & Astron, Leicester, Leics, England.
    Kuznetsova, M.
    NASA, Goddard Spaceflight Ctr, Greenbelt, MD USA.
    Kendall, D. J.
    Canadian Space Agcy, St Hubert, PQ, Canada.
    International Collaboration Within the United Nations Committee on the Peaceful Uses of Outer Space: Framework for International Space Weather Services (2018-2030)2018In: Space Weather: The international journal of research and applications, ISSN 1542-7390, E-ISSN 1542-7390, Vol. 16, no 5, p. 428-433Article in journal (Other academic)
    Abstract [en]

    Severe space weather is a global threat that requires a coordinated global response. In this Commentary, we review some previous successful actions supporting international coordination between member states in the United Nations (UN) context and make recommendations for a future approach. Member states of the UN Committee on the Peaceful Uses of Outer Space (COPUOS) recently approved new guidelines related to space weather under actions for the long-term sustainability of outer space activities. This is to be followed by UN Conference on the Exploration and Peaceful Uses of Outer Space (UNISPACE)+50, which will take place in June 2018 on the occasion of the fiftieth anniversary of the first UNISPACE I held in Vienna in 1968. Expanded international coordination has been proposed within COPUOS under the UNISPACE+50 process, where priorities for 2018-2030 are to be defined under Thematic Priority 4: Framework for International Space Weather Services. The COPUOS expert group for space weather has proposed the creation of a new International Coordination Group for Space Weather be implemented as part of this thematic priority. This coordination group would lead international coordination between member states and across international stakeholders, monitor progress against implementation of guidelines and best practices, and promote coordinated global efforts in the space weather ecosystem spanning observations, research, modeling, and validation, with the goal of improved space weather services. We argue that such improved coordination at the international policy level is essential for increasing global resiliency against the threats arising from severe space weather.

  • 183.
    Mohammedsalih, Salah
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media, Human-Computer Interaction.
    Mobile Journalism: Using smartphone in journalistic work2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Mobile phones have had a drastic influence on media production, by providing a ubiquitous connection. This revolution has come about when smartphone turned into a powerful tool to do almost all the production-related work that was done previously by specialized equipment and computers. This has encouraged ordinary individuals to involve in media work and emerging the phenomenon of mobile journalism, where citizens and individuals can engage in journalism work carry out a job that was supposed to be done only by journalists for a long time ago. We are talking about hundreds of thousands of prosumers and amateurs who are making and covering news by their smartphones and contributing to journalism work. This has become particularly apparent in relation to reporting from remote and risky areas, where journalists cannot reach easily or may not arrive on time while important events occur. This was obvious during the Arab-spring - The role of smartphones in feeding both social media and traditional media with instant photos and videos taken by protesters themselves. This thesis focuses on the role of the smartphone in facilitating the work of journalists.

    As a part of the literature review, the author has gone through many texts, watched videos and listened to radio shows with journalists and workers in media spheres, in which journalists talk about their own experience with practicing mobile journalism. Then from a phenomenological perspective and framework the experience of technology and user aspects of mobile journalism are investigated. As the aim of this thesis is not to validate a hypothesis or a theory, a qualitative research method is used to come to an evaluation and explanation of the phenomenon of using mobile in journalism. For that purpose, several qualitative methods have been used to collect data such as auto-ethnography, observation, interviews and focus groups. The data are collected mainly from Kurdistan region in northern Iraq where journalists were covering news of war in dangerous and risky battle fields.  

    The findings from the results showed that the main factors that make smartphones powerful tools for journalists are: the low budget required for acquiring a smartphone compared to expensive equipment used in traditional media, the freedom and independence that a mobile can give to a journalist, the design aspects which provide a pocket-size tool with unsuspiciousness feature that make it possible to be carried and used even in areas where journalistic work is not allowed. The ubiquity feature of mobile has helped to cover news in areas where traditional media cannot be existing or cannot reach easily. The ability of individuals to obtain a smartphone in one hand and the universal design of mobile in another hand have helped to be used in journalism work by many people with no necessary training courses. This situation has created a good opportunity for media institutions and TV stations to expand their correspondents’ network all over the countries.

  • 184.
    Mohaqeqi, Morteza
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Abdullah, Jakaria
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Ekberg, Pontus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Yi, Wang
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Refinement of workload models for engine controllers by state space partitioning2017In: 29th Euromicro Conference on Real-Time Systems: ECRTS 2017, Dagstuhl, Germany: Leibniz-Zentrum für Informatik , 2017, p. 11:1-22Conference paper (Refereed)
  • 185.
    Mottola, Luca
    et al.
    SICS and Politecnico di Milano.
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Picco, G. P.
    Univ of Trento.
    Electronically-switched Directional Antennas for Wireless Sensor Networks: A Full-stack Evaluation2013In: IEEE SECON, 2013Conference paper (Refereed)
  • 186.
    Mustini, Jeton
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Development of a cloud service and a mobile client that visualizes business data stored in Microsoft Dynamics CRM2015Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    In this master thesis a prototype application is developed to help decision makers analyze data and present it so that decision makers can make business decisions more easily. The application consists of a client application, a cloud service, and a Microsoft Dynamics CRM system. The client application is developed as a Windows Store App, and the cloud service is developed as a web application using ASP.NET Web API. From the client users can connect to the cloud service by providing a set of user credentials. These credentials are then used against the users Microsoft Dynamics CRM server to retrieve business data. Data is modeled in a component on the cloud service to useful information defined by key performance indicators. The user's hierarchical organization structure is also replicated in the cloud service to enable users to drill-down forward and backward between organizational units and view their key performance indicators. These key performance indicators are finally returned to the client and presented on a dashboard using interactive charts

  • 187.
    nakajima, masayuki
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Current Topics in Computer Graphics;: Report of SIGGRAPH20132013In: ITE Technical Report, ITE , 2013, p. 13-20Conference paper (Other (popular science, discussion, etc.))
    Abstract [en]

    CG ,Human Interface, Multimedia, Virtual Reality  Technology  are improved rapidly these days for many kind1s of  fields  in entertainment like movie ,TV and  game and Visualization in Engineering ,Science and Art etc..  I report current topics in 40  SIGGRAPH2013 conference  in Anahaim  Convention Center , Calfolnia.

  • 188.
    Nakajima, Masayuki
    Gotland University, School of Game Design, Technology and Learning Processes.
    Intelligent CG Making Technology and Intelligent Media2013In: ITE Transactions on Media Technology and Applications, ISSN 2186-7364, Vol. 1, no 1, p. 20-26Article in journal (Refereed)
    Abstract [en]

    In this invited research paper, I will describe the Intelligent CG Making Technology, (ICGMT) productionmethodology and Intelligent Media (IM). I will begin with an explanation of the key aspects of theICGMT and a definition of IM. Thereafter I will explain the three approaches of the ICGMT. These approachesare the reuse of animation data, the making animation from text, and the making animation from natural spokenlanguage. Finally, I will explain current approaches of the ICGMT under development by the Nakajima laboratory.

  • 189.
    nakajima, masayuki
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Chang, Youngha
    Tokyo City University.
    Mukai, Nobuhiko
    Tokyo City University.
    Color Similarity Metric Based on Categorical Color Perception2013In: ITE journal, ISSN 1342-6893, Vol. 67, no 3, p. 116-119Article in journal (Refereed)
    Abstract [en]

    The calculation of color difference is one of the most basic techniques in image processing fields. For example, color clustering and edge detection are the first steps of most image processes and we compute them by using a color difference formula. Although the CIELAB color difference formula is a commonly used one, the results obtained with it are not in accordance with human feelings when the color difference becomes large. In this paper, we have performed psychophysical experiments on color similarity between colors that have large color differences. We have then analyzed the results and found that the similarity is strongly restricted by the basic color categories. In accordance with this result, we propose a new color similarity metric based on the CIEDE2000 color difference formula and categorical color perception.

  • 190.
    nakajima, masayuki
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Miyai, Ayumi
    Tokyo University.
    Yamaguchi, Yasushi
    Tokyo University.
    How to  Evaluate Learning Outcomes of Stereoscopic 3D Computer Graphics by Scene Rendering2013In: ITE Technical Report: Vol.37,No.45,ME2013-117, ITE , 2013, p. 21-24Conference paper (Other (popular science, discussion, etc.))
    Abstract [en]

     Use of stereoscopic 3DCG (S3DCG) is increasing in movies, games and animations. However, a method for objectively evaluating production capability has not been established. If possible to measure the production capability on basis of certain criteria, unified evaluation can be useful at school. In addition, it is useful to human resource development and adoption of enterprise. Therefore, the experiment conducted practical tests using 3DCG software to making a scene of S3DCG. The practical tests were carried out before and after subjects learning. As a result, it was able to measure improvement of subject's capability after learning, and the difference in capability between subjects. In this paper, we will report on the experimental method and results.

  • 191.
    nakajima, masayuki
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Ono, Smiaki
    Alexis, Andre
    Chang, Youngha
    Tokyo City University.
    Automatic Generation of LEGO from the Polygonal data2013In: IWAIT2013 Nagoya, 2013, p. 262-267Conference paper (Refereed)
    Abstract [en]

    In this work, we propose a method that converts a 3D

    polygonal model into a corresponding LEGO brick assembly. For

    this, we first convert the polygonal model into the voxel model,

    and then convert it to the brick representation. The difficulty lies

    in the connection between bricks should be guaranteed. To

    achieve this, we define replacement priority, and the conversion

    from voxel to brick representation is done according to this priority.

    We show some experimental results, which show that our

    method can keep the connection, and achieve a robust and optimized

    method for assembling LEGO building bricks.

  • 192.
    nakajima, masayuki
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of Game Design.
    Ono, Sumiaki
    Chang, Yang
    Tokyo City University.
    Andre, Alexis
    LEGO Builder: Automatic Generation of LEGO Assembly Manual from 3D Polygon Model2013In: ITE English Journal, ISSN 1342-6893, Vol. 1, no 4, p. 354-360Article in journal (Refereed)
    Abstract [en]

    The LEGO brick system is one of the most popular toys in the world. It can stimulate one’s creativity while

    being lots of fun. It is however very hard for the naive user to assemble complex models without instructions. In this work,

    we propose a method that converts 3D polygonal models into LEGO brick building instructions automatically. The most

    important part of the conversion is that the connectivity between the bricks should be assured. For this, we introduce a

    graph structure named ”legograph” that allows us to generate physically sound models that do not fall apart by managing

    the connections between the bricks. We show some experimental results and evaluation results. These show that the 3D

    brick models generated following the instructions generated by our method do not fall apart and that one can learn how to

    efficiently build 3D structures from our instructions.

  • 193.
    Ngo, Tuan-Phong
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems.
    Model Checking of Software Systems under Weak Memory Models2019Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    When a program is compiled and run on a modern architecture, different optimizations may be applied to gain in efficiency. In particular, the access operations (e.g., read and write) to the shared memory may be performed in an out-of-order manner, i.e., in a different order than the order in which the operations have been issued by the program. The reordering of memory access operations leads to efficient use of instruction pipelines and thus an improvement in program execution times. However, the gain in this efficiency comes at a price. More precisely, programs running under modern architectures may exhibit unexpected behaviors by programmers. The out-of-order execution has led to the invention of new program semantics, called weak memory model (WMM). One crucial problem is to ensure the correctness of concurrent programs running under weak memory models.

    The thesis proposes three techniques for reasoning and analyzing concurrent programs running under WMMs. The first one is a sound and complete analysis technique for finite-state programs running under the TSO semantics (Paper II). This technique is based on a novel and equivalent semantics for TSO, called Dual TSO semantics, and on the use of well-structured transition framework. The second technique is an under-approximation technique that can be used to detect bugs under the POWER semantics (Paper III). This technique is based on bounding the number of contexts in an explored execution where, in each context, there is only one active process. The third technique is also an under-approximation technique based on systematic testing (a.k.a. stateless model checking). This approach has been used to develop an optimal and efficient systematic testing approach for concurrent programs running under the Release-Acquire semantics (Paper IV).

    The thesis also considers the problem of effectively finding a minimal set of fences that guarantees the correctness of a concurrent program running under WMMs (Paper I). A fence (a.k.a. barrier) is an operation that can be inserted in the program to prohibit certain reorderings between operations issued before and after the fence. Since fences are expensive, it is crucial to automatically find a minimal set of fences to ensure the program correctness. This thesis presents a method for automatic fence insertion in programs running under the TSO semantics that offers the best-known trade-off between the efficiency and optimality of the algorithm. The technique is based on a novel notion of correctness, called Persistence, that compares the behaviors of a program running under WMMs to that running under the SC semantics.

    List of papers
    1. The Best of Both Worlds: Trading efficiency and optimality in fence insertion for TSO
    Open this publication in new window or tab >>The Best of Both Worlds: Trading efficiency and optimality in fence insertion for TSO
    2015 (English)In: Programming Languages and Systems: ESOP 2015, Springer Berlin/Heidelberg, 2015, p. 308-332Conference paper, Published paper (Refereed)
    Abstract [en]

    We present a method for automatic fence insertion in concurrent programs running under weak memory models that provides the best known trade-off between efficiency and optimality. On the one hand, the method can efficiently handle complex aspects of program behaviors such as unbounded buffers and large numbers of processes. On the other hand, it is able to find small sets of fences needed for ensuring correctness of the program. To this end, we propose a novel notion of correctness, called persistence, that compares the behavior of the program under the weak memory semantics with that under the classical interleaving (SC) semantics. We instantiate our framework for the Total Store Ordering (TSO) memory model, and give an algorithm that reduces the fence insertion problem under TSO to the reachability problem for programs running under SC. Furthermore, we provide an abstraction scheme that substantially increases scalability to large numbers of processes. Based on our method, we have implemented a tool and run it successfully on a wide range benchmarks.

    Place, publisher, year, edition, pages
    Springer Berlin/Heidelberg, 2015
    Series
    Lecture Notes in Computer Science, ISSN 0302-9743 ; 9032
    Keywords
    weak memory, correctness, verification, TSO, concurrent program
    National Category
    Computer Sciences
    Research subject
    Computer Science
    Identifiers
    urn:nbn:se:uu:diva-253645 (URN)10.1007/978-3-662-46669-8_13 (DOI)000361751400013 ()978-3-662-46668-1 (ISBN)
    Conference
    24th European Symposium on Programming, ESOP 2015, April 11–18, London, UK
    Projects
    UPMARC
    Available from: 2015-05-29 Created: 2015-05-29 Last updated: 2018-11-21
    2. A load-buffer semantics for total store ordering
    Open this publication in new window or tab >>A load-buffer semantics for total store ordering
    2018 (English)In: Logical Methods in Computer Science, ISSN 1860-5974, E-ISSN 1860-5974, Vol. 14, no 1, article id 9Article in journal (Refereed) Published
    Abstract [en]

    We address the problem of verifying safety properties of concurrent programs running over the Total Store Order (TSO) memory model. Known decision procedures for this model are based on complex encodings of store buffers as lossy channels. These procedures assume that the number of processes is fixed. However, it is important in general to prove the correctness of a system/algorithm in a parametric way with an arbitrarily large number of processes. 

    In this paper, we introduce an alternative (yet equivalent) semantics to the classical one for the TSO semantics that is more amenable to efficient algorithmic verification and for the extension to parametric verification. For that, we adopt a dual view where load buffers are used instead of store buffers. The flow of information is now from the memory to load buffers. We show that this new semantics allows (1) to simplify drastically the safety analysis under TSO, (2) to obtain a spectacular gain in efficiency and scalability compared to existing procedures, and (3) to extend easily the decision procedure to the parametric case, which allows obtaining a new decidability result, and more importantly, a verification algorithm that is more general and more efficient in practice than the one for bounded instances.

    Keywords
    Verification, TSO, concurrent program, safety property, well-structured transition system
    National Category
    Computer Sciences
    Research subject
    Computer Science
    Identifiers
    urn:nbn:se:uu:diva-337278 (URN)000426512000008 ()
    Projects
    UPMARC
    Available from: 2018-01-23 Created: 2017-12-21 Last updated: 2018-11-21
    3. Context-bounded analysis for POWER
    Open this publication in new window or tab >>Context-bounded analysis for POWER
    2017 (English)In: Tools and Algorithms for the Construction and Analysis of Systems: Part II, Springer, 2017, p. 56-74Conference paper, Published paper (Refereed)
    Abstract [en]

    We propose an under-approximate reachability analysis algorithm for programs running under the POWER memory model, in the spirit of the work on context-bounded analysis initiated by Qadeer et al. in 2005 for detecting bugs in concurrent programs (supposed to be running under the classical SC model). To that end, we first introduce a new notion of context-bounding that is suitable for reasoning about computations under POWER, which generalizes the one defined by Atig et al. in 2011 for the TSO memory model. Then, we provide a polynomial size reduction of the context-bounded state reachability problem under POWER to the same problem under SC: Given an input concurrent program P, our method produces a concurrent program P' such that, for a fixed number of context switches, running P' under SC yields the same set of reachable states as running P under POWER. The generated program P' contains the same number of processes as P and operates on the same data domain. By leveraging the standard model checker CBMC, we have implemented a prototype tool and applied it on a set of benchmarks, showing the feasibility of our approach.

    Place, publisher, year, edition, pages
    Springer, 2017
    Series
    Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10206
    Keywords
    POWER, weak memory model, under approximation, translation, concurrent program, testing
    National Category
    Computer Systems
    Research subject
    Computer Science
    Identifiers
    urn:nbn:se:uu:diva-314901 (URN)10.1007/978-3-662-54580-5_4 (DOI)000440733400004 ()978-3-662-54579-9 (ISBN)
    Conference
    23rd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2017, April 22–29, Uppsala, Sweden
    Projects
    UPMARC
    Available from: 2017-03-31 Created: 2017-02-07 Last updated: 2018-11-21Bibliographically approved
    4. Optimal Stateless Model Checking under the Release-Acquire Semantics
    Open this publication in new window or tab >>Optimal Stateless Model Checking under the Release-Acquire Semantics
    2018 (English)In: SPLASH OOPSLA 2018, Boston, Nov 4-9, 2018, ACM Digital Library, 2018Conference paper, Published paper (Refereed)
    Abstract [en]

    We present a framework for efficient application of stateless model checking (SMC) to concurrent programs running under the Release-Acquire (RA) fragment of the C/C++11 memory model. Our approach is based on exploring the possible program orders, which define the order in which instructions of a thread are executed, and read-from relations, which define how reads obtain their values from writes. This is in contrast to previous approaches, which in addition explore the possible coherence orders, i.e., orderings between conflicting writes. Since unexpected test results such as program crashes or assertion violations depend only on the read-from relation, we avoid a potentially large source of redundancy. Our framework is based on a novel technique for determining whether a particular read-from relation is feasible under the RA semantics. We define an SMC algorithm which is provably optimal in the sense that it explores each program order and read-from relation exactly once. This optimality result is strictly stronger than previous analogous optimality results, which also take coherence order into account. We have implemented our framework in the tool Tracer. Experiments show that Tracer can be significantly faster than state-of-the-art tools that can handle the RA semantics.

    Place, publisher, year, edition, pages
    ACM Digital Library, 2018
    Keywords
    Software model checking, C/C++11, Release-Acquire, Concurrent program
    National Category
    Computer Systems
    Research subject
    Computer Science
    Identifiers
    urn:nbn:se:uu:diva-358241 (URN)
    Conference
    SPLASH OOPSLA 2018
    Projects
    UPMARC
    Available from: 2018-08-26 Created: 2018-08-26 Last updated: 2019-01-09Bibliographically approved
  • 194.
    Ngo, Tuan-Phong
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Abdulla, Parosh
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Jonsson, Bengt
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Atig, Mohamed Faouzi
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Optimal Stateless Model Checking under the Release-Acquire Semantics2018In: SPLASH OOPSLA 2018, Boston, Nov 4-9, 2018, ACM Digital Library, 2018Conference paper (Refereed)
    Abstract [en]

    We present a framework for efficient application of stateless model checking (SMC) to concurrent programs running under the Release-Acquire (RA) fragment of the C/C++11 memory model. Our approach is based on exploring the possible program orders, which define the order in which instructions of a thread are executed, and read-from relations, which define how reads obtain their values from writes. This is in contrast to previous approaches, which in addition explore the possible coherence orders, i.e., orderings between conflicting writes. Since unexpected test results such as program crashes or assertion violations depend only on the read-from relation, we avoid a potentially large source of redundancy. Our framework is based on a novel technique for determining whether a particular read-from relation is feasible under the RA semantics. We define an SMC algorithm which is provably optimal in the sense that it explores each program order and read-from relation exactly once. This optimality result is strictly stronger than previous analogous optimality results, which also take coherence order into account. We have implemented our framework in the tool Tracer. Experiments show that Tracer can be significantly faster than state-of-the-art tools that can handle the RA semantics.

  • 195.
    Nikoleris, Nikos
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Efficient Memory Modeling During Simulation and Native Execution2019Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Application performance on computer processors depends on a number of complex architectural and microarchitectural design decisions. Consequently, computer architects rely on performance modeling to improve future processors without building prototypes. This thesis focuses on performance modeling and proposes methods that quantify the impact of the memory system on application performance.

    Detailed architectural simulation, a common approach to performance modeling, can be five orders of magnitude slower than execution on the actual processor. At this rate, simulating realistic workloads requires years of CPU time. Prior research uses sampling to speed up simulation. Using sampled simulation, only a number of small but representative portions of the workload are evaluated in detail. To fully exploit the speed potential of sampled simulation, the simulation method has to efficiently reconstruct the architectural and microarchitectural state prior to the simulation samples. Practical approaches to sampled simulation use either functional simulation at the expense of performance or checkpoints at the expense of flexibility. This thesis proposes three approaches that use statistical cache modeling to efficiently address the problem of cache warm up and speed up sampled simulation, without compromising flexibility. The statistical cache model uses sparse memory reuse information obtained with native techniques to model the performance of the cache. The proposed sampled simulation framework evaluates workloads 150 times faster than approaches that use functional simulation to warm up the cache.

    Other approaches to performance modeling use analytical models based on data obtained from execution on native hardware. These native techniques allow for better understanding of the performance bottlenecks on existing hardware. Efficient resource utilization in modern multicore processors is necessary to exploit their peak performance. This thesis proposes native methods that characterize shared resource utilization in modern multicores. These methods quantify the impact of cache sharing and off-chip memory sharing on overall application performance. Additionally, they can quantify scalability bottlenecks for data-parallel, symmetric workloads.

    List of papers
    1. Extending statistical cache models to support detailed pipeline simulators
    Open this publication in new window or tab >>Extending statistical cache models to support detailed pipeline simulators
    2014 (English)In: 2014 IEEE International Symposium On Performance Analysis Of Systems And Software (Ispass), IEEE Computer Society, 2014, p. 86-95Conference paper, Published paper (Refereed)
    Abstract [en]

    Simulators are widely used in computer architecture research. While detailed cycle-accurate simulations provide useful insights, studies using modern workloads typically require days or weeks. Evaluating many design points, only exacerbates the simulation overhead. Recent works propose methods with good accuracy that reduce the simulated overhead either by sampling the execution (e.g., SMARTS and SimPoint) or by using fast analytical models of the simulated designs (e.g., Interval Simulation). While these techniques reduce significantly the simulation overhead, modeling processor components with large state, such as the last-level cache, requires costly simulation to warm them up. Statistical simulation methods, such as SMARTS, report that the warm-up overhead accounts for 99% of the simulation overhead, while only 1% of the time is spent simulating the target design. This paper proposes WarmSim, a method that eliminates the need to warm up the cache. WarmSim builds on top of a statistical cache modeling technique and extends it to model accurately not only the miss ratio but also the outcome of every cache request. WarmSim uses as input, an application's memory reuse information which is hardware independent. Therefore, different cache configurations can be simulated using the same input data. We demonstrate that this approach can be used to estimate the CPI of the SPEC CPU2006 benchmarks with an average error of 1.77%, reducing the overhead compared to a simulation with a 10M instruction warm-up by a factor of 50x.

    Place, publisher, year, edition, pages
    IEEE Computer Society, 2014
    Series
    IEEE International Symposium on Performance Analysis of Systems and Software-ISPASS
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-224221 (URN)10.1109/ISPASS.2014.6844464 (DOI)000364102000010 ()978-1-4799-3604-5 (ISBN)
    Conference
    ISPASS 2014, March 23-25, Monterey, CA
    Projects
    UPMARC
    Available from: 2014-05-06 Created: 2014-05-06 Last updated: 2018-12-14Bibliographically approved
    2. CoolSim: Statistical Techniques to Replace Cache Warming with Efficient, Virtualized Profiling
    Open this publication in new window or tab >>CoolSim: Statistical Techniques to Replace Cache Warming with Efficient, Virtualized Profiling
    2016 (English)In: Proceedings Of 2016 International Conference On Embedded Computer Systems: Architectures, Modeling And Simulation (Samos) / [ed] Najjar, W Gerstlauer, A, IEEE , 2016, p. 106-115Conference paper, Published paper (Refereed)
    Abstract [en]

    Simulation is an important part of the evaluation of next-generation computing systems. Detailed, cycle-accurate simulation, however, can be very slow when evaluating realistic workloads on modern microarchitectures. Sampled simulation (e.g., SMARTS and SimPoint) improves simulation performance by an order of magnitude or more through the reduction of large workloads into a small but representative sample. Additionally, the execution state just prior to a simulation sample can be stored into checkpoints, allowing for fast restoration and evaluation. Unfortunately, changes in software, architecture or fundamental pieces of the microarchitecture (e.g., hardware-software co-design) require checkpoint regeneration. The end result for co-design degenerates to creating checkpoints for each modification, a task checkpointing was designed to eliminate. Therefore, a solution is needed that allows for fast and accurate simulation, without the need for checkpoints. Virtualized fast-forwarding (VFF), an alternative to using checkpoints, allows for execution at near-native speed between simulation points. Warming the micro-architectural state prior to each simulation point, however, requires functional simulation, a costly operation for large caches (e.g., 8 M B). Simulating future systems with caches of many MBs can require warming of billions of instructions, dominating simulation time. This paper proposes CoolSim, an efficient simulation framework that eliminates cache warming. CoolSim uses VFF to advance between simulation points collecting at the same time sparse memory reuse information (MRI). The MRI is collected more than an order of magnitude faster than functional simulation. At the simulation point, detailed simulation with a statistical cache model is used to evaluate the design. The previously acquired MRI is used to estimate whether each memory request hits in the cache. The MRI is an architecturally independent metric and a single profile can be used in simulations of any size cache. We describe a prototype implementation of CoolSim based on KVM and gem5 running 19 x faster than the state-of-the-art sampled simulation, while it estimates the CPI of the SPEC CPU2006 benchmarks with 3.62% error on average, across a wide range of cache sizes.

    Place, publisher, year, edition, pages
    IEEE, 2016
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-322061 (URN)000399143000015 ()9781509030767 (ISBN)
    Conference
    International Conference on Embedded Computer Systems - Architectures, Modeling and Simulation (SAMOS), JUL 17-21, 2016, Samos, GREECE
    Funder
    Swedish Foundation for Strategic Research EU, FP7, Seventh Framework Programme, 610490
    Available from: 2017-05-16 Created: 2017-05-16 Last updated: 2018-12-14Bibliographically approved
    3. Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation
    Open this publication in new window or tab >>Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation
    2018 (English)Report (Other academic)
    Abstract [en]

    Current practice for accurate and efficient simulation (e.g., SMARTS and Simpoint) makes use of sampling to significantly reduce the time needed to evaluate new research ideas. By evaluating a small but representative portion of the original application, sampling can allow for both fast and accurate performance analysis. However, as cache sizes of modern architectures grow, simulation time is dominated by warming microarchitectural state and not by detailed simulation, reducing overall simulation efficiency. While checkpoints can significantly reduce cache warming, improving efficiency, they limit the flexibility of the system under evaluation, requiring new checkpoints for software updates (such as changes to the compiler and compiler flags) and many types of hardware modifications. An ideal solution would allow for accurate cache modeling for each simulation run without the need to generate rigid checkpointing data a priori.

    Enabling this new direction for fast and flexible simulation requires a combination of (1) a methodology that allows for hardware and software flexibility and (2) the ability to quickly and accurately model arbitrarily-sized caches. Current approaches that rely on checkpointing or statistical cache modeling require rigid, up-front state to be collected which needs to be amortized over a large number of simulation runs. These earlier methodologies are insufficient for our goals for improved flexibility. In contrast, our proposed methodology, Delorean, outlines a unique solution to this problem. The Delorean simulation methodology enables both flexibility and accuracy by quickly generating a targeted cache model for the next detailed region on the fly without the need for up-front simulation or modeling. More specifically, we propose a new, more accurate statistical cache modeling method that takes advantage of hardware virtualization to precisely determine the memory regions accessed and to minimize the time needed for data collection while maintaining accuracy.

    Delorean uses a multi-pass approach to understand the memory regions accessed by the next, upcoming detailed region. Our methodology collects the entire set of key memory accesses and, through fast virtualization techniques, progressively scans larger, earlier regions to learn more about these key accesses in an efficient way. Using these techniques, we demonstrate that Delorean allows for the fast evaluation of systems and their software though the generation of accurate cache models on the fly. Delorean outperforms previous proposals by an order of magnitude, with a simulation speed of 150 MIPS and a similar average CPI error (below 4%).

    Publisher
    p. 12
    Series
    Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203
    National Category
    Computer Systems
    Research subject
    Computer Science
    Identifiers
    urn:nbn:se:uu:diva-369320 (URN)
    Available from: 2018-12-12 Created: 2018-12-12 Last updated: 2019-01-08Bibliographically approved
    4. Cache Pirating: Measuring the Curse of the Shared Cache
    Open this publication in new window or tab >>Cache Pirating: Measuring the Curse of the Shared Cache
    2011 (English)In: Proc. 40th International Conference on Parallel Processing, IEEE Computer Society, 2011, p. 165-175Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    IEEE Computer Society, 2011
    National Category
    Computer Engineering
    Identifiers
    urn:nbn:se:uu:diva-181254 (URN)10.1109/ICPP.2011.15 (DOI)978-1-4577-1336-1 (ISBN)
    Conference
    ICPP 2011
    Projects
    UPMARCCoDeR-MP
    Available from: 2011-10-17 Created: 2012-09-20 Last updated: 2018-12-14Bibliographically approved
    5. Bandwidth Bandit: Quantitative Characterization of Memory Contention
    Open this publication in new window or tab >>Bandwidth Bandit: Quantitative Characterization of Memory Contention
    2013 (English)In: Proc. 11th International Symposium on Code Generation and Optimization: CGO 2013, IEEE Computer Society, 2013, p. 99-108Conference paper, Published paper (Refereed)
    Abstract [en]

    On multicore processors, co-executing applications compete for shared resources, such as cache capacity and memory bandwidth. This leads to suboptimal resource allocation and can cause substantial performance loss, which makes it important to effectively manage these shared resources. This, however, requires insights into how the applications are impacted by such resource sharing. While there are several methods to analyze the performance impact of cache contention, less attention has been paid to general, quantitative methods for analyzing the impact of contention for memory bandwidth. To this end we introduce the Bandwidth Bandit, a general, quantitative, profiling method for analyzing the performance impact of contention for memory bandwidth on multicore machines. The profiling data captured by the Bandwidth Bandit is presented in a bandwidth graph. This graph accurately captures the measured application's performance as a function of its available memory bandwidth, and enables us to determine how much the application suffers when its available bandwidth is reduced. To demonstrate the value of this data, we present a case study in which we use the bandwidth graph to analyze the performance impact of memory contention when co-running multiple instances of single threaded application.

    Place, publisher, year, edition, pages
    IEEE Computer Society, 2013
    Keywords
    bandwidth, memory, caches
    National Category
    Computer Sciences
    Research subject
    Computer Science
    Identifiers
    urn:nbn:se:uu:diva-194101 (URN)10.1109/CGO.2013.6494987 (DOI)000318700200010 ()978-1-4673-5524-7 (ISBN)
    Conference
    CGO 2013, 23-27 February, Shenzhen, China
    Projects
    UPMARC
    Funder
    Swedish Research Council
    Available from: 2013-04-18 Created: 2013-02-08 Last updated: 2018-12-14Bibliographically approved
    6. A software based profiling method for obtaining speedup stacks on commodity multi-cores
    Open this publication in new window or tab >>A software based profiling method for obtaining speedup stacks on commodity multi-cores
    2014 (English)In: 2014 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS): ISPASS 2014, IEEE Computer Society, 2014, p. 148-157Conference paper, Published paper (Refereed)
    Abstract [en]

    A key goodness metric of multi-threaded programs is how their execution times scale when increasing the number of threads. However, there are several bottlenecks that can limit the scalability of a multi-threaded program, e.g., contention for shared cache capacity and off-chip memory bandwidth; and synchronization overheads. In order to improve the scalability of a multi-threaded program, it is vital to be able to quantify how the program is impacted by these scalability bottlenecks. We present a software profiling method for obtaining speedup stacks. A speedup stack reports how much each scalability bottleneck limits the scalability of a multi-threaded program. It thereby quantifies how much its scalability can be improved by eliminating a given bottleneck. A software developer can use this information to determine what optimizations are most likely to improve scalability, while a computer architect can use it to analyze the resource demands of emerging workloads. The proposed method profiles the program on real commodity multi-cores (i.e., no simulations required) using existing performance counters. Consequently, the obtained speedup stacks accurately account for all idiosyncrasies of the machine on which the program is profiled. While the main contribution of this paper is the profiling method to obtain speedup stacks, we present several examples of how speedup stacks can be used to analyze the resource requirements of multi-threaded programs. Furthermore, we discuss how their scalability can be improved by both software developers and computer architects.

    Place, publisher, year, edition, pages
    IEEE Computer Society, 2014
    Series
    IEEE International Symposium on Performance Analysis of Systems and Software-ISPASS
    National Category
    Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-224230 (URN)10.1109/ISPASS.2014.6844479 (DOI)000364102000025 ()978-1-4799-3604-5 (ISBN)
    Conference
    ISPASS 2014, March 23-25, Monterey, CA
    Projects
    UPMARC
    Available from: 2014-05-06 Created: 2014-05-06 Last updated: 2018-12-14Bibliographically approved
  • 196.
    Nikoleris, Nikos
    et al.
    Arm Research, Cambridge UK.
    Hagersten, Erik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Carlson, Trevor E.
    Department of Computer Science, National University of Singapore.
    Delorean: Virtualized Directed Profiling for Cache Modeling in Sampled Simulation2018Report (Other academic)
    Abstract [en]

    Current practice for accurate and efficient simulation (e.g., SMARTS and Simpoint) makes use of sampling to significantly reduce the time needed to evaluate new research ideas. By evaluating a small but representative portion of the original application, sampling can allow for both fast and accurate performance analysis. However, as cache sizes of modern architectures grow, simulation time is dominated by warming microarchitectural state and not by detailed simulation, reducing overall simulation efficiency. While checkpoints can significantly reduce cache warming, improving efficiency, they limit the flexibility of the system under evaluation, requiring new checkpoints for software updates (such as changes to the compiler and compiler flags) and many types of hardware modifications. An ideal solution would allow for accurate cache modeling for each simulation run without the need to generate rigid checkpointing data a priori.

    Enabling this new direction for fast and flexible simulation requires a combination of (1) a methodology that allows for hardware and software flexibility and (2) the ability to quickly and accurately model arbitrarily-sized caches. Current approaches that rely on checkpointing or statistical cache modeling require rigid, up-front state to be collected which needs to be amortized over a large number of simulation runs. These earlier methodologies are insufficient for our goals for improved flexibility. In contrast, our proposed methodology, Delorean, outlines a unique solution to this problem. The Delorean simulation methodology enables both flexibility and accuracy by quickly generating a targeted cache model for the next detailed region on the fly without the need for up-front simulation or modeling. More specifically, we propose a new, more accurate statistical cache modeling method that takes advantage of hardware virtualization to precisely determine the memory regions accessed and to minimize the time needed for data collection while maintaining accuracy.

    Delorean uses a multi-pass approach to understand the memory regions accessed by the next, upcoming detailed region. Our methodology collects the entire set of key memory accesses and, through fast virtualization techniques, progressively scans larger, earlier regions to learn more about these key accesses in an efficient way. Using these techniques, we demonstrate that Delorean allows for the fast evaluation of systems and their software though the generation of accurate cache models on the fly. Delorean outperforms previous proposals by an order of magnitude, with a simulation speed of 150 MIPS and a similar average CPI error (below 4%).

  • 197. Noda, Claro
    et al.
    Prabh, Shashi
    Alves, Mario
    Voigt, Thiemo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    On Packet Size and Error Correction Optimisations in Low-Power Wireless Networks2013In: IEEE International Conference on Sensing, Communication and Networking (IEEE SECON), 2013Conference paper (Refereed)
  • 198. Noda, Claro
    et al.
    Prabh, Shashi
    Boano, Carlo Alberto
    Voigt, Thiemo
    Alves, Mário
    Poster abstract: A channel quality metric for interference-aware wireless sensor networks2011In: IPSN, 2011, p. 167-168Conference paper (Refereed)
  • 199.
    Norgren, Magnus
    Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences.
    Wishbone compliant smart Pulse-Width Modulation (PWM) IP: Uppsala Universitet - ÅAC Mictrotec AB2012Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
  • 200.
    Olofsson, Simon
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control.
    Probabilistic Feature Learning Using Gaussian Process Auto-Encoders2016Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The focus of this report is the problem of probabilistic dimensionality reduction and feature learning from high-dimensional data (images). Extracting features and being able to learn from high-dimensional sensory data is an important ability in a general-purpose intelligent system. Dimensionality reduction and feature learning have in the past primarily been done using (convolutional) neural networks or linear mappings, e.g. in principal component analysis. However, these methods do not yield any error bars in the features or predictions. In this report, theory and a model for how dimensionality reduction and feature learning can be done using Gaussian process auto-encoders (GP-AEs) are presented. By using GP-AEs, the variance in the feature space is computed, thus, yielding a measure of the uncertainty in the constructed model. This measure is useful in order to avoid making over-confident system predictions. Results show that GP-AEs are capable of dimensionality reduction and feature learning, but that they suffer from scalability issues and problems with weak gradient signal propagation. Results in reconstruction quality are not as good as those achieved by state-of-the-art methods, and it takes very long to train the model. The model has potential though, since it can scale to large inputs.

123456 151 - 200 of 252
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf