Change search
Refine search result
1 - 8 of 8
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Bessani, A.
    et al.
    Brandt, J.
    Bux, M.
    Cogo, V.
    Dimitrova, L.
    Dowling, Jim
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Gholami, Ali
    KTH.
    Hakimzadeh, Kamal
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Hummel, M.
    Ismail, Mahmoud
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Leser, U.
    Litton, J. -E
    Martinez, R.
    Niazi, Salman
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Reichel, J.
    Zimmermann, K.
    BiobankCloud: A platform for the secure storage, sharing, and processing of large biomedical data sets2016In: 1st International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2015 and Workshop on Big-Graphs Online Querying, Big-O(Q) 2015 held in conjunction with 41st International Conference on Very Large Data Bases, VLDB 2015, Springer, 2016, p. 89-105Conference paper (Refereed)
    Abstract [en]

    Biobanks store and catalog human biological material that is increasingly being digitized using next-generation sequencing (NGS). There is, however, a computational bottleneck, as existing software systems are not scalable and secure enough to store and process the incoming wave of genomic data from NGS machines. In the BiobankCloud project, we are building a Hadoop-based platform for the secure storage, sharing, and parallel processing of genomic data. We extended Hadoop to include support for multi-tenant studies, reduced storage requirements with erasure coding, and added support for extensible and consistent metadata. On top of Hadoop, we built a scalable scientific workflow engine featuring a proper workflow definition language focusing on simple integration and chaining of existing tools, adaptive scheduling on Apache Yarn, and support for iterative dataflows. Our platform also supports the secure sharing of data across different, distributed Hadoop clusters. The software is easily installed and comes with a user-friendly web interface for running, managing, and accessing data sets behind a secure 2-factor authentication. Initial tests have shown that the engine scales well to dozens of nodes. The entire system is open-source and includes pre-defined workflows for popular tasks in biomedical data analysis, such as variant identification, differential transcriptome analysis using RNA-Seq, and analysis of miRNA-Seq and ChIP-Seq data.

  • 2.
    Ismail, Mahmoud
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Bonds, August
    KTH.
    Niazi, Salman
    Logical Clocks AB.
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dowling, Jim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Scalable Block Reporting for HopsFS2019In: 2019 IEEE International Congress on Big Data (BigData Congress), 2019, p. 157-164Conference paper (Refereed)
    Abstract [en]

    Distributed hierarchical file systems typically de- couple the storage of the file system’s metadata from the data (file system blocks) to enable the scalability of the file system. This decoupling, however, requires the introduction of a periodic synchronization protocol to ensure the consistency of the file system’s metadata and its blocks. Apache HDFS and HopsFS implement a protocol, called block reporting, where each data server periodically sends ground truth information about all its file system blocks to the metadata servers, allowing the metadata to be synchronized with the actual state of the data blocks in the file system. The network and processing overhead of the existing block reporting protocol, however, increases with cluster size, ultimately limiting cluster scalability. In this paper, we introduce a new block reporting protocol for HopsFS that reduces the protocol bandwidth and processing overhead by up to three orders of magnitude, compared to HDFS/HopsFS’ existing protocol. Our new protocol removes a major bottleneck that prevented HopsFS clusters scaling to tens of thousands of servers.

  • 3.
    Ismail, Mahmoud
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Gebremeskel, Ermias
    Kakantousis, Theofilos
    Berthou, Gautier
    Dowling, Jim
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS. RISE SICS, Sweden.
    Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata2017In: 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017) / [ed] Lee, K Liu, L, IEEE COMPUTER SOC , 2017, p. 2525-2528Conference paper (Refereed)
    Abstract [en]

    Hadoop is a popular system for storing, managing, and processing large volumes of data, but it has bare-bones internal support for metadata, as metadata is a bottleneck and less means more scalability. The result is a scalable platform with rudimentary access control that is neither user-nor developer friendly. Also, metadata services that are built on Hadoop, such as SQL-on-Hadoop, access control, data provenance, and data governance are necessarily implemented as eventually consistent services, resulting in increased development effort and more brittle software. In this paper, we present a new project-based multi-tenancy model for Hadoop, built on a new distribution of Hadoop that provides a distributed database backend for the Hadoop Distributed Filesystem's (HDFS) metadata layer. We extend Hadoop's metadata model to introduce projects, datasets, and project-users as new core concepts that enable a user-friendly, UI-driven Hadoop experience. As our metadata service is backed by a transactional database, developers can easily extend metadata by adding new tables and ensure the strong consistency of extended metadata using both transactions and foreign keys.

  • 4.
    Ismail, Mahmoud
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Niazi, Salman
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Ronstrom, M.
    Haridi, S.
    Dowling, Jim
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Scaling HDFS to more than 1 million operations per second with HopsFS2017In: Proceedings - 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017, Institute of Electrical and Electronics Engineers Inc. , 2017, p. 683-688Conference paper (Refereed)
    Abstract [en]

    HopsFS is an open-source, next generation distribution of the Apache Hadoop Distributed File System(HDFS) that replaces the main scalability bottleneck in HDFS, single node in-memory metadata service, with a no-sharedstate distributed system built on a NewSQL database. By removing the metadata bottleneck in Apache HDFS, HopsFS enables significantly larger cluster sizes, more than an order of magnitude higher throughput, and significantly lower clientlatencies for large clusters. In this paper, we detail the techniques and optimizations that enable HopsFS to surpass 1 million file system operations per second-at least 16 times higher throughput than HDFS. In particular, we discuss how we exploit recent high performance features from NewSQL databases, such as application defined partitioning, partition-pruned index scans, and distribution aware transactions. Together with more traditional techniques, such as batching and write-Ahead caches, we show how many incremental optimizations have enabled a revolution in distributed hierarchical file system performance.

  • 5.
    Ismail, Mahmoud
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ronström, Mikael
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dowling, Jim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata2019In: 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2019, p. 92-101Conference paper (Refereed)
    Abstract [en]

    Distributed OLTP databases are now used to manage metadata for distributed file systems, but they cannot also efficiently support complex queries or aggregations. To solve this problem, we introduce ePipe, a databus that both creates a consistent change stream for a distributed, hierarchical file system (HopsFS) and eventually delivers the correctly ordered stream with low latency to downstream clients. ePipe can be used to provide polyglot storage for file system metadata, allowing metadata queries to be handled by the most efficient engine for that query. For file system notifications, we show that ePipe achieves up to 56X throughput improvement over HDFS INotify and Trumpet with up to 3 orders of magnitude lower latency. For Spotify’s Hadoop workload, we show that ePipe can replicate all file system changes from HopsFS to Elasticsearch with an average replication lag of only 330 ms.

  • 6.
    Niazi, Salman
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Dowling, Jim
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Ismail, Mahmoud
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Berthou, Gautier
    Leader Election Using NewSQL Database Systems2015In: Distributed Applications and Interoperable Systems: 15th IFIP WG 6.1 International Conference, DAIS 2015, Held as Part of the 10th International Federated Conference on Distributed Computing Techniques, DisCoTec 2015, Grenoble, France, June 2-4, 2015, Proceedings / [ed] Alysson Bessani and Sara Bouchenak, France: Springer, 2015, p. 158-172Conference paper (Refereed)
    Abstract [en]

    Leader election protocols are a fundamental building blockfor replicated distributed services. They ease the design of leader-basedcoordination protocols that tolerate failures. In partially synchronoussystems, designing a leader election algorithm, that does not permit mul-tiple leaders while the system is unstable, is a complex task. As a resultmany production systems use third-party distributed coordination ser-vices, such as ZooKeeper and Chubby, to provide a reliable leader electionservice. However, adding a third-party service such as ZooKeeper to adistributed system incurs additional operational costs and complexity.ZooKeeper instances must be kept running on at least three machinesto ensure its high availability. In this paper, we present a novel leaderelection protocol using NewSQL databases for partially synchronous sys-tems, that ensures at most one leader at any given time. The leaderelection protocol uses the database as distributed shared memory. Ourwork enables distributed systems that already use NewSQL databasesto save the operational overhead of managing an additional third-partyservice for leader election. Our main contribution is the design, imple-mentation and validation of a practical leader election algorithm, basedon NewSQL databases, that has performance comparable to a leaderelection implementation using a state-of-the-art distributed coordinationservice, ZooKeeper

  • 7.
    Niazi, Salman
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ismail, Mahmoud
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dowling, Jim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases2019In: Encyclopedia of Big Data Technologies / [ed] Sherif Sakr, Albert Y. Zomaya, Springer, 2019, p. 16-32Chapter in book (Refereed)
    Abstract [en]

    Modern NewSQL database systems can be used to store fully normalized metadata for distributed hierarchical file systems, and provide high throughput and low operational latencies for the file system operations.

  • 8.
    Niazi, Salman
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Ismail, Mahmoud
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Haridi, Seif
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Dowling, Jim
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Grohsschmiedt, Steffen
    Ronström, Mikael
    HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases2017In: 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017, USENIX Association , 2017, p. 89-103Conference paper (Refereed)
    Abstract [en]

    Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ single node in-memory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS enables an order of magnitude larger and higher throughput clusters compared to HDFS. Metadata capacity has been increased to at least 37 times HDFS’ capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS. HopsFS also has lower latency for many concurrent clients, and no downtime during failover. Finally, as metadata is now stored in a commodity database, it can be safely extended and easily exported to external systems for online analysis and free-text search.

1 - 8 of 8
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf