Ändra sökning
Avgränsa sökresultatet
1 - 27 av 27
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1. Aartsen, M. G.
    et al.
    Abbasi, R.
    Ackermann, M.
    Adams, J.
    Aguilar, J. A.
    Ahlers, M.
    Altmann, D.
    Arguelles, C.
    Auffenberg, J.
    Bai, X.
    Baker, M.
    Barwick, S. W.
    Baum, V.
    Bay, R.
    Beatty, J. J.
    Tjus, J. B.
    Becker, K. H.
    BenZvi, S.
    Berghaus, P.
    Berley, D.
    Bernardini, E.
    Bernhard, A.
    Besson, D. Z.
    Binder, G.
    Bindig, D.
    Bissok, M.
    Blaufuss, E.
    Blumenthal, J.
    Boersma, D. J.
    Bohm, C.
    Bose, D.
    Boser, S.
    Botner, O.
    Brayeur, L.
    Bretz, H. P.
    Brown, A. M.
    Bruijn, R.
    Casey, J.
    Casier, M.
    Chirkin, D.
    Christov, A.
    Christy, B.
    Clark, K.
    Classen, L.
    Clevermann, F.
    Coenders, S.
    Cohen, S.
    Cowen, D. F.
    Silva, A. H. C.
    Danninger, M.
    Daughhetee, J.
    Davis, J. C.
    Day, M.
    De Clercq, C.
    De Ridder, S.
    Desiati, P.
    de Vries, K. D.
    de With, M.
    DeYoung, T.
    Diaz-Velez, J. C.
    Dunkman, M.
    Eagan, R.
    Eberhardt, B.
    Eichmann, B.
    Eisch, J.
    Euler, S.
    Evenson, P. A.
    Fadiran, O.
    Fazely, A. R.
    Fedynitch, A.
    Feintzeig, J.
    Feusels, T.
    Filimonov, K.
    Finley, C.
    Fischer-Wasels, T.
    Flis, S.
    Franckowiak, A.
    Frantzen, K.
    Fuchs, T.
    Gaisser, T. K.
    Gallagher, J.
    Gerhardt, L.
    Gladstone, L.
    Glusenkamp, T.
    Goldschmidt, A.
    Golup, G.
    Gonzalez, J. G.
    Goodman, J. A.
    Gora, D.
    Grandmont, D. T.
    Grant, D.
    Gretskov, P.
    Groh, J. C.
    Gross, A.
    Ha, C.
    Ismail, A. H.
    Hallen, P.
    Hallgren, A.
    Halzen, F.
    Hanson, K.
    Hebecker, D.
    Heereman, D.
    Heinen, D.
    Helbing, K.
    Hellauer, R.
    Hickford, S.
    Hill, G. C.
    Hoffman, K. D.
    Hoffmann, R.
    Homeier, A.
    Hoshina, K.
    Huang, F.
    Huelsnitz, W.
    Hulth, P. O.
    Hultqvist, K.
    Hussain, S.
    Ishihara, A.
    Jacobi, E.
    Jacobsen, J.
    Jagielski, K.
    Japaridze, G. S.
    Jero, K.
    Jlelati, O.
    Kaminsky, B.
    Kappes, A.
    Karg, T.
    Karle, A.
    Kauer, M.
    Kelley, J. L.
    Kiryluk, J.
    Klas, J.
    Klein, S. R.
    Kohne, J. H.
    Kohnen, G.
    Kolanoski, H.
    Kopke, L.
    Kopper, C.
    Kopper, S.
    Koskinen, D. J.
    Kowalski, M.
    Krasberg, M.
    Kriesten, A.
    Krings, K.
    Kroll, G.
    Kunnen, J.
    Kurahashi, N.
    Kuwabara, T.
    Labare, M.
    Landsman, H.
    Larson, M. J.
    Lesiak-Bzdak, M.
    Leuermann, M.
    Leute, J.
    Lunemann, J.
    Macias, O.
    Madsen, J.
    Maggi, G.
    Maruyama, R.
    Mase, K.
    Matis, H. S.
    McNally, F.
    Meagher, K.
    Merck, M.
    Merino, G.
    Meures, T.
    Miarecki, S.
    Middell, E.
    Milke, N.
    Miller, J.
    Mohrmann, L.
    Montaruli, T.
    Morse, R.
    Nahnhauer, R.
    Naumann, U.
    Niederhausen, H.
    Nowicki, S. C.
    Nygren, D. R.
    Obertacke, A.
    Odrowski, S.
    Olivas, A.
    Omairat, A.
    O'Murchadha, A.
    Paul, L.
    Pepper, J. A.
    de los Heros, C. P.
    Pfendner, C.
    Pieloth, D.
    Pinat, E.
    Posselt, J.
    Price, P. B.
    Przybylski, G. T.
    Quinnan, M.
    Radel, L.
    Rae, I.
    Rameez, M.
    Rawlins, K.
    Redl, P.
    Reimann, R.
    Resconi, E.
    Rhode, W.
    Ribordy, M.
    Richman, M.
    Riedel, B.
    Rodrigues, J. P.
    Rott, C.
    Ruhe, T.
    Ruzybayev, B.
    Ryckbosch, D.
    Saba, S. M.
    Sander, H. G.
    Santander, M.
    Sarkar, S.
    Schatto, K.
    Scheriau, F.
    Schmidt, T.
    Schmitz, M.
    Schoenen, S.
    Schoneberg, S.
    Schonwald, A.
    Schukraft, A.
    Schulte, L.
    Schultz, D.
    Schulz, O.
    Secke, D.
    Sestayo, Y.
    Seunarine, S.
    Shanidze, R.
    Sheremata, C.
    Smith, M. W. E.
    Soldin, D.
    Spiczak, G. M.
    Spiering, C.
    Stamatikos, M.
    Stanev, T.
    Stanisha, N. A.
    Stasik, A.
    Stezelberger, T.
    Stokstad, R. G.
    Stossl, A.
    Strahler, E. A.
    Strom, R.
    Strotjohann, N. L.
    Sullivan, G. W.
    Taavola, H.
    Taboada, I.
    Tamburro, A.
    Tepe, A.
    Ter-Antonyan, S.
    Tesic, G.
    Tilav, S.
    Toale, P. A.
    Tobin, M. N.
    Toscano, S.
    Tselengidou, M.
    Unger, E.
    Usner, M.
    Vallecorsa, S.
    van Eijndhoven, N.
    van Overloop, A.
    van Santen, J.
    Vehring, M.
    Voge, M.
    Vraeghe, M.
    Walck, C.
    Waldenmaier, T.
    Wallraff, M.
    Weaver, C.
    Wellons, M.
    Wendt, C.
    Westerhoff, S.
    Whitehorn, N.
    Wiebe, K.
    Wiebusch, C. H.
    Williams, D. R.
    Wissing, H.
    Wolf, M.
    Wood, T. R.
    Woschnagg, K.
    Xu, D. L.
    Xu, X. W.
    Yanez, J. P.
    Yodh, G.
    Yoshida, S.
    Zarzhitsky, P.
    Ziemann, J.
    Zierke, S.
    Zoll, M.
    The IceProd framework: Distributed data processing for the IceCube neutrino observatory2015Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 75Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, identify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. This paper presents the first detailed description of IceProd, a lightweight distributed management system designed to meet these requirements. It is driven by a central database in order to manage mass production of simulations and analysis of data produced by the IceCube detector. IceProd runs as a separate layer on top of other middleware and can take advantage of a variety of computing resources, including grids and batch systems such as CREAM, HTCondor, and PBS. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plugins that serve to abstract the details of job submission and job management from the framework. (C) 2014 Elsevier Inc. All rights reserved.

  • 2. Aartsen, M. G.
    et al.
    Abbasi, R.
    Ackermann, M.
    Adams, J.
    Aguilar, J. A.
    Ahlers, M.
    Altmann, D.
    Arguelles, C.
    Auffenberg, J.
    Bai, X.
    Baker, M.
    Barwick, S. W.
    Baum, V.
    Bay, R.
    Beatty, J. J.
    Tjus, J. Becker
    Becker, K. -H
    BenZvi, S.
    Berghaus, P.
    Berley, D.
    Bernardini, E.
    Bernhard, A.
    Besson, D. Z.
    Binder, G.
    Bindig, D.
    Bissok, M.
    Blaufuss, E.
    Blumenthal, J.
    Boersma, David J.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Fysiska sektionen, Institutionen för fysik och astronomi, Högenergifysik.
    Bohm, C.
    Bose, D.
    Boeser, S.
    Botner, Olga
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Fysiska sektionen, Institutionen för fysik och astronomi, Högenergifysik.
    Brayeur, L.
    Bretz, H. -P
    Brown, A. M.
    Bruijn, R.
    Casey, J.
    Casier, M.
    Chirkin, D.
    Christov, A.
    Christy, B.
    Clark, K.
    Classen, L.
    Clevermann, F.
    Coenders, S.
    Cohen, S.
    Cowen, D. F.
    Silva, A. H. Cruz
    Danninger, M.
    Daughhetee, J.
    Davis, J. C.
    Day, M.
    De Clercq, C.
    De Ridder, S.
    Desiati, P.
    de Vries, K. D.
    de With, M.
    DeYoung, T.
    Diaz-Velez, J. C.
    Dunkman, M.
    Eagan, R.
    Eberhardt, B.
    Eichmann, B.
    Eisch, J.
    Euler, S.
    Evenson, P. A.
    Fadiran, O.
    Fazely, A. R.
    Fedynitch, A.
    Feintzeig, J.
    Feusels, T.
    Filimonov, K.
    Finley, C.
    Fischer-Wasels, T.
    Flis, S.
    Franckowiak, A.
    Frantzen, K.
    Fuchs, T.
    Gaisser, T. K.
    Gallagher, J.
    Gerhardt, L.
    Gladstone, L.
    Glusenkamp, T.
    Goldschmidt, A.
    Golup, G.
    Gonzalez, J. G.
    Goodman, J. A.
    Gora, D.
    Grandmont, D. T.
    Grant, D.
    Gretskov, P.
    Groh, J. C.
    Gross, A.
    Ha, C.
    Ismail, A. Haj
    Hallen, P.
    Hallgren, Allan
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Fysiska sektionen, Institutionen för fysik och astronomi, Högenergifysik.
    Halzen, F.
    Hanson, K.
    Hebecker, D.
    Heereman, D.
    Heinen, D.
    Helbing, K.
    Hellauer, R.
    Hickford, S.
    Hill, G. C.
    Hoffman, K. D.
    Hoffmann, R.
    Homeier, A.
    Hoshina, K.
    Huang, F.
    Huelsnitz, W.
    Hulth, P. O.
    Hultqvist, K.
    Hussain, S.
    Ishihara, A.
    Jacobi, E.
    Jacobsen, J.
    Jagielski, K.
    Japaridze, G. S.
    Jero, K.
    Jlelati, O.
    Kaminsky, B.
    Kappes, A.
    Karg, T.
    Karle, A.
    Kauer, M.
    Kelley, J. L.
    Kiryluk, J.
    Klaes, J.
    Klein, S. R.
    Koehne, J. -H
    Kohnen, G.
    Kolanoski, H.
    Koepke, L.
    Kopper, C.
    Kopper, S.
    Koskinen, D. J.
    Kowalski, M.
    Krasberg, M.
    Kriesten, A.
    Krings, K.
    Kroll, G.
    Kunnen, J.
    Kurahashi, N.
    Kuwabara, T.
    Labare, M.
    Landsman, H.
    Larson, M. J.
    Lesiak-Bzdak, M.
    Leuermann, M.
    Leute, J.
    Luenemann, J.
    Macias, O.
    Madsen, J.
    Maggi, G.
    Maruyama, R.
    Mase, K.
    Matis, H. S.
    McNally, F.
    Meagher, K.
    Merck, M.
    Merino, G.
    Meures, T.
    Miarecki, S.
    Middell, E.
    Milke, N.
    Miller, J.
    Mohrmann, L.
    Montaruli, T.
    Morse, R.
    Nahnhauer, R.
    Naumann, U.
    Niederhausen, H.
    Nowicki, S. C.
    Nygren, D. R.
    Obertacke, A.
    Odrowski, S.
    Olivas, A.
    Omairat, A.
    O'Murchadha, A.
    Paul, L.
    Pepper, J. A.
    de los Heros, Carlos Perez
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Fysiska sektionen, Institutionen för fysik och astronomi, Högenergifysik.
    Pfendner, C.
    Pieloth, D.
    Pinat, E.
    Posselt, J.
    Price, P. B.
    Przybylski, G. T.
    Quinnan, M.
    Raedel, L.
    Rae, I.
    Rameez, M.
    Rawlins, K.
    Redl, P.
    Reimann, R.
    Resconi, E.
    Rhode, W.
    Ribordy, M.
    Richman, M.
    Riedel, B.
    Rodrigues, J. P.
    Rott, C.
    Ruhe, T.
    Ruzybayev, B.
    Ryckbosch, D.
    Saba, S. M.
    Sander, H. -G
    Santander, M.
    Sarkar, S.
    Schatto, K.
    Scheriau, F.
    Schmidt, T.
    Schmitz, M.
    Schoenen, S.
    Schoeneberg, S.
    Schoenwald, A.
    Schukraft, A.
    Schulte, L.
    Schultz, D.
    Schulz, O.
    Secke, D.
    Sestayo, Y.
    Seunarine, S.
    Shanidze, R.
    Sheremata, C.
    Smith, M. W. E.
    Soldin, D.
    Spiczak, G. M.
    Spiering, C.
    Stamatikos, M.
    Stanev, T.
    Stanisha, N. A.
    Stasik, A.
    Stezelberger, T.
    Stokstad, R. G.
    Stoessl, A.
    Strahler, E. A.
    Ström, Rickard
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Fysiska sektionen, Institutionen för fysik och astronomi, Högenergifysik.
    Strotjohann, N. L.
    Sullivan, G. W.
    Taavola, Henric
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Fysiska sektionen, Institutionen för fysik och astronomi, Högenergifysik.
    Taboada, I.
    Tamburro, A.
    Tepe, A.
    Ter-Antonyan, S.
    Tesic, G.
    Tilav, S.
    Toale, P. A.
    Tobin, M. N.
    Toscano, S.
    Tselengidou, M.
    Unger, E.
    Usner, M.
    Vallecorsa, S.
    van Eijndhoven, N.
    van Overloop, A.
    van Santen, J.
    Vehring, M.
    Voge, M.
    Vraeghe, M.
    Walck, C.
    Waldenmaier, T.
    Wallraff, M.
    Weaver, Ch.
    Wellons, M.
    Wendt, C.
    Westerhoff, S.
    Whitehorn, N.
    Wiebe, K.
    Wiebusch, C. H.
    Williams, D. R.
    Wissing, H.
    Wolf, M.
    Wood, T. R.
    Woschnagg, K.
    Xu, D. L.
    Xu, X. W.
    Yanez, J. P.
    Yodh, G.
    Yoshida, S.
    Zarzhitsky, P.
    Ziemann, J.
    Zierke, S.
    Zoll, M.
    The IceProd framework: Distributed data processing for the IceCube neutrino observatory2015Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 75, 198-211 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, identify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. This paper presents the first detailed description of IceProd, a lightweight distributed management system designed to meet these requirements. It is driven by a central database in order to manage mass production of simulations and analysis of data produced by the IceCube detector. IceProd runs as a separate layer on top of other middleware and can take advantage of a variety of computing resources, including grids and batch systems such as CREAM, HTCondor, and PBS. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plugins that serve to abstract the details of job submission and job management from the framework. (C) 2014 Elsevier Inc. All rights reserved.

  • 3.
    Bohm, Christian
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Fysikum.
    Danninger, Matthias
    Stockholms universitet, Naturvetenskapliga fakulteten, Fysikum. Stockholms universitet, Naturvetenskapliga fakulteten, Oskar Klein-centrum för kosmopartikelfysik (OKC).
    Finley, Chad
    Stockholms universitet, Naturvetenskapliga fakulteten, Fysikum. Stockholms universitet, Naturvetenskapliga fakulteten, Oskar Klein-centrum för kosmopartikelfysik (OKC).
    Flis, Samuel
    Stockholms universitet, Naturvetenskapliga fakulteten, Fysikum. Stockholms universitet, Naturvetenskapliga fakulteten, Oskar Klein-centrum för kosmopartikelfysik (OKC).
    Hulth, Per Olof
    Stockholms universitet, Naturvetenskapliga fakulteten, Fysikum. Stockholms universitet, Naturvetenskapliga fakulteten, Oskar Klein-centrum för kosmopartikelfysik (OKC).
    Hultqvist, Klas
    Stockholms universitet, Naturvetenskapliga fakulteten, Fysikum. Stockholms universitet, Naturvetenskapliga fakulteten, Oskar Klein-centrum för kosmopartikelfysik (OKC).
    Walck, Christian
    Stockholms universitet, Naturvetenskapliga fakulteten, Fysikum. Stockholms universitet, Naturvetenskapliga fakulteten, Oskar Klein-centrum för kosmopartikelfysik (OKC).
    Wolf, Martin
    Stockholms universitet, Naturvetenskapliga fakulteten, Fysikum. Stockholms universitet, Naturvetenskapliga fakulteten, Oskar Klein-centrum för kosmopartikelfysik (OKC).
    Zoll, Marcel
    Stockholms universitet, Naturvetenskapliga fakulteten, Fysikum. Stockholms universitet, Naturvetenskapliga fakulteten, Oskar Klein-centrum för kosmopartikelfysik (OKC).
    The IceProd framework: Distributed data processing for the IceCube neutrino observatory2015Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 75, 198-211 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, identify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. This paper presents the first detailed description of IceProd, a lightweight distributed management system designed to meet these requirements. It is driven by a central database in order to manage mass production of simulations and analysis of data produced by the IceCube detector. IceProd runs as a separate layer on top of other middleware and can take advantage of a variety of computing resources, including grids and batch systems such as CREAM, HTCondor, and PBS. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plugins that serve to abstract the details of job submission and job management from the framework.

  • 4.
    Cao, Liang
    et al.
    Nanjing University of Posts and Telecommunications.
    Wang, Yufeng
    Nanjing University of Posts and Telecommunications.
    Zhang, Bo
    Nanjing University of Posts and Telecommunications.
    Jin, Qun
    Waseda University, Japan.
    Vasilakos, Athanasios
    Luleå tekniska universitet, Institutionen för system- och rymdteknik, Datavetenskap.
    GCHAR: An efficient Group-based Context–aware human activity recognition on smartphone2017Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    With smartphones increasingly becoming ubiquitous and being equipped with various sensors, nowadays, there is a trend towards implementing HAR (human activity recognition) algorithms and applications on smartphones, including health monitoring, self-managing system and fitness tracking etc. However, one of main issues of the existing HAR schemes is that the classification accuracy is relatively low, and in order to improve the accuracy, high computation overhead is needed. In this paper, an efficient Group-based Context-aware classification method for human activity recognition on smartphones, GCHAR is proposed, which exploits hierarchical group-based scheme to improve the classification efficiency, and reduces the classification error through context awareness rather than the intensive computation. Specifically, GCHAR designs the two-level hierarchical classification structure, i.e., inter-group and inner-group, and utilizes the previous state and transition logic (so-called context awareness) to detect the transitions among activity groups. In comparison with other popular classifiers such as RandomTree, Bagging, J48, BayesNet, KNN and Decision Table, etc., thorough experiments on the realistic dataset (UCI HAR repository) demonstrate that GCHAR achieves the best classification accuracy, reaching 94.1636%, and time consumption in training stage of GCHAR is four times shorter than the simple Decision Table and is decreased by 72.21% in classification stage in comparison with BayesNet

  • 5.
    Grahn, Håkan
    Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
    Transactional Memory2010Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 70, nr 10, 993-1008 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Current and future processor generations are based on multicore architectures where the performance increase comes from an increasing number of cores on a chip. In order to utilize the performance potential of multicore architectures the programs also need to be parallel, but writing parallel programs is a non-trivial task. Transactional memory tries to ease parallel program development by providing atomic and isolated execution of code sequences, enabling software composability and protected access to shared data. In addition, transactional memory has the ability to execute atomic code sequences in parallel as long as no data conflicts occur. Transactional memory implementation proposals exit for both hardware and software, as well as hybrid solutions. This special issue on transactional memory introduces transactional memory as a concept, presents an overview of some of the most important approaches so far, and finally, includes five articles that advances the state-of-the-art in transactional memory research.

  • 6.
    Grahn, Håkan
    et al.
    Blekinge Tekniska Högskola, Institutionen för programvaruteknik och datavetenskap.
    Stenström, Per
    Comparative evaluation of latency-tolerating and -reducing techniques for hardware-only and software-only directory protocols2000Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 60, nr 7, 807-834 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We study in this paper how effective latency-tolerating and -reducing techniques are at cutting the memory access times for shared-memory multiprocessors with directory cache protocols managed by hardware and software. A critical issue for the relative efficiency is how many protocol operations such techniques trigger. This paper presents a framework that makes it possible to reason about the expected relative efficiency of a latency-tolerating or -reducing technique by focusing on whether the technique increases, decreases, or does not change the number of protocol operations at the memory module. Since software-only directory protocols handle these operations in software they will perform relatively worse unless the technique reduces the number of protocol operations. Our experimental results from detailed architectural simulations driven by six applications from the SPLASH-2 parallel program suite confirm this expectation, We find that while prefetching performs relatively worse on software-only directory protocols due to useless prefetches, there are examples of protocol optimizations, e.g., optimizations For migratory data, that do relatively better on software-only directory protocols. Overall, this study shows that latency-tolerating techniques must be more carefully selected for software-centric than for hardware-centric implementations of distributed shared-memory systems. (C) 2000 Academic Press.

  • 7.
    Grahn, Håkan
    et al.
    Blekinge Tekniska Högskola, Institutionen för programvaruteknik och datavetenskap.
    Stenström, Per
    Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection1996Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 39, nr 2, 168-180 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Although directory-based write-invalidate cache coherence protocols have a potential to improve the performance of large-scale multiprocessors, coherence misses limit the processor utilization. Therefore, so-called competitive-update protocols-hybrid protocols that on a per-block basis dynamically switch between write-invalidate and write-update-have been considered as a means to reduce the coherence miss rate and have been shown to be a better coherence policy for a wide range of applications. Unfortunately, such protocols may cause high traffic peaks for applications with extensive use of migratory objects. These traffic peaks can offset the performance gain of a reduced miss rate if the network bandwidth is not sufficient. We propose in this study to extend a competitive-update protocol with a previously published adaptive mechanism that can dynamically detect migratory objects and reduce the coherence traffic they cause. Detailed architectural simulations based on five scientific and engineering applications show that this adaptive protocol outperforms a write-invalidate protocol by reducing the miss rate and bandwidth needed by up to 71 and 26%, respectively.

  • 8.
    Guo, Yao
    et al.
    School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China.
    Vlassov, Vladimir
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik- och datorsystem, ECS.
    Ashok, Raksit
    Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA.
    Weiss, Richard
    The Evergreen State College, Olympia, WA 98505, USA.
    Andras Moritz, Csaba
    Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 01003, USA.
    Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization2008Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 68, nr 2, 165-181 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The quest to improve performance forces designers to explore finer-grained multiprocessor machines. Ever increasing chip densities based on CMOS improvements fuel research in highly parallel chip multiprocessors with 100s of processing elements. With such increasing levels of parallelism, synchronization is set to become a major performance bottleneck and efficient support for synchronization an important design criterion. Previous research has shown that integrating support for fine-grained synchronization can have significant performance benefits compared to traditional coarse-grained synchronization. Not much progress has been made in supporting fine-grained synchronization transparently to processor nodes: a key reason perhaps why wide adoption has not followed. In this paper, we propose a novel approach called synchronization coherence that can provide transparent fine-grained synchronization and caching in a multiprocessor machine and single-chip multiprocessor. Our approach merges fine-grained synchronization mechanisms with traditional cache coherence protocols. It reduces network utilization as well as synchronization related processing overheads while adding minimal hardware complexity as compared to cache coherence mechanisms or previously reported fine-grained synchronization techniques. In addition to its benefit of making synchronization transparent to processor nodes, for the applications studied, it provides up to 23% improvement in performance and up to 24% improvement in energy efficiency with no L2 caches compared to previous fine-grained synchronization techniques. The performance improvement increases up to 38% when simulating with an ideal L2 cache system.

  • 9. Ho, Ching-Tien
    et al.
    Raghunath, M.T.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    An Efficient Algorithm for Gray–to–Binary Permutation on Hypercubes1994Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 20, nr 1, 114-120 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

     Both Gray code and binary code are frequently used in mapping arrays into hypercube architectures. While the former is preferred when communication between adjacent array elements is needed, the latter is preferred for FFT-type communication. When different phases of computations have different types of communication patterns, the need arises to remap the data. We give a nearly optimal algorithm for permuting data from a Gray code mapping to a binary code mapping on a hypercube with communication restricted to one input and one output channel per node at a time. Our algorithm improves over the best previously known algorithm [6] by nearly a factor of two and is optimal to within a factor of n=(n Gamma 1) with respect to data transfer time on an n-cube. The expected speedup is confirmed by measurements on an Intel iPSC/2 hypercube

  • 10.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures1987Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 4, nr 2, 133-179 s.Artikel i tidskrift (Refereegranskat)
    Abstract [et]

    This paper presents a few algorithms for embedding loops and multidimensional arrays in hypercubes with emphasis on proximity preserving embeddings. A proximity preserving embedding minimizes the need for communication bandwidth in computations requiring nearest neighbor communication. Two storage schemes for "large" problems on "small" machines are suggested and analyzed and algorithms for matrix transpose, multiplying matrices, factoring matrices,  and solving triangular linear systems are presented. A few complete binary tree embeddings are described and analyzed. The data movement in the matrix algorithms is analyzed and it is shown that in the majority of cases the directed routing paths intersect only at nodes of the hypercube allowing for a maximum degree of pipelining

  • 11.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Performance Modeling of Distributed Memory Architectures1991Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 12, nr 4, 300-312 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We provide performance models for several primitive operations on data structures distributed over memory units interconnected by a Boolean cube network. In particular, we model single-source and multiple-source concurrent broadcasting or reduction, concurrent gather and scatter operations, shifts along several axes of multidimensional arrays, and emulation of butterfly networks. We also show how the processor configuration, the data aggregation, and the encoding of the address space affect the performance for two important basic computations: the multiplication of arbitrarily shaped matrices and the Fast Fourier Transform. We also give an example of the performance behavior for local matrix operations for a processor with a single path to local memory and a set of processor registers. The analytic models are verified by measurements on the Connection Machine Model CM-2.

  • 12.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Ho, Ching-Tien
    Boolean Cube Emulation of Butterfly Networks Encoded by Gray Code1994Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 20, nr 3, 261-179 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The authors present algorithms for butterfly emulation on binary-reflected Gray coded data that require the same number of element transfers in sequence in a Boolean cube network as for a binary encoding. The required code conversion is either performed in local memories, or through concurrent exchanges not effecting the number of element transfers in sequence. The emulation of a butterfly network with one or two elements per processor requires n communication cycles on an n-cube. For more than two elements per processor, one additional communication cycle is required for every pair of elements. The encoding on completion can be either binary, or binary reflected Gray code, or any combination thereof, without affecting the communication complexity.

  • 13.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Ho, Ching-Tien
    Generalized Shuffle Permutations on Boolean Cubes1992Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 16, nr 1, 1-14 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In a generalized permutation an address (a[subscript q-1]a[subscript q-2] ... a0 receives its content from an address obtained through a cyclic shift on a subset of the q dimensions used for the encoding of the addresses. Bit-complementation may be combined with the shift. We give an algorithm that requires K/2 + 2 exchanges for K elements per processor, when storage dimensions are part of the permutation, and concurrent communication on all ports of every processor is possible. The number of element exchanges in sequence is independent of the number of processor dimensions [omega subscript r] in the permutation.

  • 14. Kennedy, K.
    et al.
    Broom, B.
    Cooper, K.
    Dongarra, J.
    Fowler, R.
    Gannon, D.
    Johnsson, Lennart
    KTH, Tidigare Institutioner, Numerisk analys och datalogi, NADA.
    Mellor-Crummey, J.
    Torczon, L.
    Telescoping languages: A strategy for automatic generation of scientific problem-solving systems from annotated libraries2001Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 61, nr 12, 1803-1826 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    As machines and programs have become more complex., the process of programming applications that can exploit the power of high-performance systems has become more difficult and correspondingly more labor-intensive. This has substantially widened the software gap the discrepancy between the need for new software and the aggregate capacity of the workforce to produce it. This problem has been compounded by the slow growth of programming productivity, especially for high-performance programs, over the past two decades. One way to bridge this gap is to make it possible for end users to develop programs in high-level domain-specific programming systems. In the past, a major impediment to the acceptance of such systems has been the poor performance of the resulting applications. To address this problem, we are developing a new compiler-based infrastructure, called TeleGen, that will make it practical to construct efficient domain-specific high-level languages from annotated component libraries. We call these languages telescoping languages, because they can be nested within one another. For programs written in telescoping languages. high performance and reasonable compilation times can be achieved by exhaustively analyzing the component libraries in advance to produce a language processor that recognizes and optimizes library operations as primitives in the language. The key to making this strategy practical is to keep compile times low by generating a custom compiler with extensive built-in knowledge of the underlying libraries. The goal is to achieve compile times that tire linearly proportional to the size of the program presented by the user. rather than to the aggregate size of that program plus the base libraries.

  • 15.
    Lampka, Kai
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Datorteknik.
    Forsberg, Björn
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi.
    Spiliopoulos, Vasileios
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Datorarkitektur och datorkommunikation.
    Keep it cool and in time: With runtime monitoring to thermal-aware execution speeds for deadline constrained systems2016Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 95, 79-91 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The Dynamic Power and Thermal Management (DPTM) system of Dynamic Voltage Frequency Scaling (DVFS) enabled processors compensates peak temperatures by slowing or even powering parts of the system down. While ensuring the integrity of computations, this comes with the drawback of losing performance. In the context of hard real-time systems, such unpredictable losses in performance are unacceptable, as they may lead to deadline misses which may yet compromise the integrity of the system. To safely execute hard real-time workloads on such systems, this article presents an online scheme for assigning speeds in such a way that (a) the system executes at low clock speed as often as possible, while (b) deadline violations are strictly ruled out. The proposed scheme is compared with an offline scheme which has complete knowledge about arrival times and execution demands of the workload. The benchmarking shows that for a workload which is always very close to the modelled maximum, our approach performs on-par with the offline scheme. In case of a workload which diverges from the modelled maximum more often, the speed assignments produced by our scheme become more pessimistic, as to ensure that all deadlines are met.

  • 16.
    Lundberg, Lars
    Blekinge Tekniska Högskola, Institutionen för programvaruteknik och datavetenskap.
    Predicting and bounding the speedup of multithreaded Solaris programs1999Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, 322-333 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In Solaris, threads are frequently relocated. The data associated with a relocated thread have to be moved from the cache of the old processor to the new processor. In order to avoid poor memory performance due to thread relocation, threads can be bound to processors-static scheduling. Finding a static schedule which results in maximum speedup is NP-hard. It is even difficult to determine if a static schedule is close to the optimal case or not. Here, a technique for predicting the speedup of multithreaded Solaris programs is presented. Based on an existing theoretical result, a lower bound on the maximal speedup is also obtained. The predicted speedup and the bound are based on recordings from a single-processor execution. When comparing the predictions with the real speedup using a multiprocessor with eight processors, we see that the predictions are very good. By comparing the speedup of a static schedule with the bound, we see that it is worthwhile to look for other schedules. (C) 1999 Academic Press.

  • 17.
    Namaki, Nima
    et al.
    Högskolan Väst, Institutionen för ekonomi och it, Avd för datavetenskap och informatik.
    de Blanche, Andreas
    Högskolan Väst, Institutionen för ekonomi och it, Avd för datavetenskap och informatik.
    Mankefors-Christiernin, Stefan
    Högskolan Väst, Institutionen för ekonomi och it, Avd för datavetenskap och informatik.
    Exhaustion dominated performance: an empirical method evalutationIngår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315Artikel i tidskrift (Refereegranskat)
  • 18.
    Nordström, Tomas
    et al.
    Division of Computer Science and Engineering, Department of Systems Engineering, Luleå University of Technology, Luleå, Sweden.
    Svensson, Bertil
    Högskolan i Halmstad, Akademin för informationsteknologi, Halmstad Embedded and Intelligent Systems Research (EIS).
    Using and designing massively parallel computers for artificial neural networks1992Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 14, nr 3, 260-285 s.Artikel, forskningsöversikt (Refereegranskat)
  • 19.
    Nordström, Tomas
    et al.
    Luleå tekniska universitet.
    Svensson, Bertil
    Luleå tekniska universitet.
    Using and designing massively parallel computers for artificial neural networks1992Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 14, nr 3, 260-285 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    During the past 10 years the fields of artificial neural networks (ANNs) and massively parallel computing have been evolving rapidly. The authors study the attempts to make ANN algorithms run on massively parallel computers as well as designs of new parallel systems tuned for ANN computing. Following a brief survey of the most commonly used models, the different dimensions of parallelism in ANN computing are identified, and the possibilities for mapping onto the structures of different parallel architectures are analyzed. Different classes of parallel architectures used or designed for ANN are identified. Reported implementations are reviewed and discussed. It is concluded that the regularity of ANN computations suits SIMD architectures perfectly and that broadcast or ring communication can be very efficiently utilized. Bit-serial processing is very interesting for ANN, but hardware support for multiplication should be included. Future artificial neural systems for real-time applications will require flexible processing modules that can be put together to form MIMSIMD systems

  • 20. Rahmani, Amir M.
    et al.
    Liljeberg, Pasi
    Ayala, Jose L.
    Tenhunen, Hannu
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Industriell och Medicinsk Elektronik.
    Veidenbaum, Alexander V.
    Special issue on energy efficient multi-core and many-core systems, Part I2016Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 95, 1-2 s.Artikel i tidskrift (Övrigt vetenskapligt)
  • 21. Rahmani, Amir M.
    et al.
    Liljeberg, Pasi
    Ayala, Jose L.
    Tenhunen, Hannu
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Industriell och Medicinsk Elektronik.
    Veidenbaum, Alexander V.
    Special issue on energy efficient multi-core and many-core systems, Part II2017Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 100, 128-129 s.Artikel i tidskrift (Övrigt vetenskapligt)
  • 22.
    Rizvandi, Nikzad Babaii
    et al.
    The University of Sydney, Australia, Natl ICT Australia NICTA, Sydney, NSW 1430, Australia.
    Taheri, Javid
    The University of Sydney, Australia.
    Zomaya, Albert
    The University of Sydney, Sydney, Australia.
    Some Observations on Optimal Frequency Selection in DVFS–based Energy Consumption Minimization2011Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 71, nr 8, 1154-1164 s.Artikel i tidskrift (Refereegranskat)
  • 23. Stoyenko, AD
    et al.
    Bosch, Jan
    Blekinge Tekniska Högskola, Institutionen för datavetenskap och ekonomi.
    Aksit, M
    Marlowe, TJ
    Load balanced mapping of distributed objects to minimize network communication1996Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, 117-136 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper introduces a new load balancing and communication minimizing heuristic used in the In verse Remote Procedure Call (IRPC) system. While the paper briefly describes the IRPC system, the focus is on the new IRPC assignment heuristic. The IRPC compiler maps a distributed program to a graph that represents program objects and their dependencies (due to invocations and parameter passing) as nodes and edges, respectively. In the graph, the system preserves conditional and iterative flows, records network transmission and execution costs, and marks nodes that have to reside at specific network sites. The graph is then partitioned by the heuristic to derive a (sub)optimal node assignment to network sites minimizing load balancing and network data transport. The resulting program partition is then reflected in the physical object distribution, and remote and local object communication is transparently implemented. The compiler and run-time system use efficient implementation techniques such as type prediction, inlining, splitting and subprogram passing. The last of these allows remote code to be copied to local data, as an alternative to copying data to the remote site, whenever this will reduce network data transport. The IRPC graph partitioning heuristic operates in time O(E(log d + l + log M)), where M is the number of network sites, E is the number of communication edges, and d is the maximum degree of a node; l is a parameter of the algorithm, and can vary between 1 and N, where N is the number of communicating objects. This complexity is more nearly independent of M, and considerably better in terms of E and N, than that of previously known related algorithms, such as A*, which employs backtracking and is potentially exponential, or the max-flow/min-cut class of network flow algorithms or heuristics which tend to be at least of Omega(MN(2)E), and it can be made (by choosing l appropriately) as efficient as even such fast heuristics as heaviest-edge-first, minimal communication, and Kernighan-Lin. In an extensive quantitative evaluation, the heuristic has been demonstrated to perform very well, giving on the average 75% traffic cost reductions for over 95% of the programs when compared to random partitioning, and outperforming in cost reduction and actual execution time the three aforementioned fast heuristics, even with a large l. Thus, to the best of our knowledge, this is the first report of a well-performing assignment heuristic that is both essentially linear in the number of communication edges, and better than existing, established heuristics of no better complexity. (C) 1996 Academic Press, Inc.

  • 24.
    Sundell, Håkan
    et al.
    Högskolan i Borås, Institutionen Handels- och IT-högskolan.
    Tsigas, Philippas
    Lock-Free Deques and Doubly Linked Lists2008Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 68, nr 7, 1008-1020 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present a practical lock-free shared data structure that efficiently implements the operations of a concurrent deque as well as a general doubly linked list. The implementation supports parallelism for disjoint accesses and uses atomic primitives which are available in modern computer systems. Previously known lock-free algorithms of doubly linked lists are either based on non-available atomic synchronization primitives, only implement a subset of the functionality, or are not designed for disjoint accesses. Our algorithm only requires single-word compare-and-swap atomic primitives, supports fully dynamic list sizes, and allows traversal also through deleted nodes and thus avoids unnecessary operation retries. We have performed an empirical study of our new algorithm on two different multiprocessor platforms. Results of the experiments performed under high contention show that the performance of our implementation scales linearly with increasing number of processors. Considering deque implementations and systems with low concurrency, the algorithm by Michael shows the best performance. However, as our algorithm is designed for disjoint accesses, it performs significantly better on systems with high concurrency and non-uniform memory architecture.

  • 25.
    Taheri, Javid
    et al.
    The University of Sydney, Australia.
    Zomaya, Albert
    The University of Sydney, Sydney, Australia.
    Iftikhar, Mohsin
    Saudi Arabia.
    Fuzzy Online Location Management in Mobile Computing Environments2011Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 71, nr 8, 1142-1153 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper presents a new approach, namely Intelligent Fuzzy Online Location Management Strategy (IFOLMS), based on Fuzzy clustering techniques to solve the mobile location management problem. Using a Fuzzy location estimator in this technique, mobile users' past movements are used in making future paging decisions by the network. IFOLMS has the potential to lead to massive savings in the number of network signal transactions that must be made to locate users. Performance of the proposed approach has been measured by using several test networks; it shows promising results - around 50% reduction in network cost - when compared to many of the existing location management techniques (including GSM). Results also provide new insights into the mobility management problem and its associated performance issues.

  • 26.
    Veanes, Margus
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för ADB och datalogi.
    Barklund, Jonas
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för ADB och datalogi.
    Natural cycletrees: Flexible interconnection graphs1996Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 33, nr 1, 44-54 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Natural cycletrees, formally defined in this paper, is a subclass of Hamiltonian graphs with maximum degree 3 that contain a binary spanning tree. A natural cycletree used as an interconnection network thus supports directly broadcasting through the binary tree as well as nearest-neighbor communication through the cycle. Natural cycletrees have several other interesting properties; e.g., they are planar, easily extensible, and can be contracted using the same methods as for binary trees. The main results of the paper are: (i) Given an arbitrary basic binary spanning treeT, there exists a natural cycletree with a minimal number of edges forT. (ii) A natural cycletree has a very simple router. We give a superfast parallel algorithm that can establish near optimal router data for that router.

  • 27.
    Wittek, Peter
    et al.
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Darányi, Sándor
    Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
    Accelerating Text Mining Workloads in a MapReduce-based Distributed GPU Environment2013Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 73, nr 2, 198-206 s.Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data intensive, and the ease of deployment of algorithms is an important factor in developing advanced applications, we introduce a flexible, distributed, MapReduce-based text mining workflow that performs I/O-bound operations on CPUs with industry-standard tools and then runs compute-bound operations on GPUs which are optimized to ensure coalesced memory access and effective use of shared memory. We have performed extensive tests of our algorithms on a cluster of eight nodes with two NVidia Tesla M2050s attached to each, and we achieve considerable speedups for random projection and self-organizing maps.

1 - 27 av 27
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf