Digitala Vetenskapliga Arkivet

Change search
Refine search result
45678910 301 - 350 of 3114
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 301. Berrada, Dounia
    et al.
    Romero, Mario
    Georgia Institute of Technology, US.
    Abowd, Gregory
    Blount, Marion
    Davis, John
    Automatic Administration of the Get Up and Go Test2007In: HealthNet'07: Proceedings of the 1st ACM SIGMOBILE International Workshop on Systems and Networking Support for Healthcare and Assisted Living Environments, ACM Digital Library, 2007, p. 73-75Conference paper (Refereed)
    Abstract [en]

    In-home monitoring using sensors has the potential to improve the life of elderly and chronically ill persons, assist their family and friends in supervising their status, and provide early warning signs to the person's clinicians. The Get Up and Go test is a clinical test used to assess the balance and gait of a patient. We propose a way to automatically apply an abbreviated version of this test to patients in their residence using video data without body-worn sensors or markers.

    Download full text (pdf)
    fulltext
  • 302.
    Besancon, Lonni
    et al.
    Linköping University, Department of Science and Technology, Media and Information Technology. Linköping University, Faculty of Science & Engineering.
    Semmo, Amir
    Univ Potsdam, Germany.
    Biau, David
    AP HP, France.
    Frachet, Bruno
    AP HP, France.
    Pineau, Virginie
    Inst Curie, France.
    Sariali, El Hadi
    AP HP, France.
    Soubeyrand, Marc
    AP HP, France.
    Taouachi, Rabah
    Inst Curie, France.
    Isenberg, Tobias
    INRIA, France.
    Dragicevic, Pierre
    INRIA, France.
    Reducing Affective Responses to Surgical Images and Videos Through Stylization2020In: Computer graphics forum (Print), ISSN 0167-7055, E-ISSN 1467-8659, Vol. 39, no 1, p. 462-483Article in journal (Refereed)
    Abstract [en]

    We present the first empirical study on using colour manipulation and stylization to make surgery images/videos more palatable. While aversion to such material is natural, it limits many peoples ability to satisfy their curiosity, educate themselves and make informed decisions. We selected a diverse set of image processing techniques to test them both on surgeons and lay people. While colour manipulation techniques and many artistic methods were found unusable by surgeons, edge-preserving image smoothing yielded good results both for preserving information (as judged by surgeons) and reducing repulsiveness (as judged by lay people). We then conducted a second set of interview with surgeons to assess whether these methods could also be used on videos and derive good default parameters for information preservation. We provide extensive supplemental material at .

  • 303. Beskow, Jonas
    On Talking Heads, Social Robots and what they can Teach us2019In: Proceedings of ICPhS, 2019Conference paper (Refereed)
  • 304.
    Beskow, Jonas
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
    Spoken and non-verbal interaction experiments with a social robot2016In: The Journal of the Acoustical Society of America, Acoustical Society of America , 2016, Vol. 140, no 3005Conference paper (Refereed)
    Abstract [en]

    During recent years, we have witnessed the start of a revolution in personal robotics. Once associated with highly specialized manufacturing tasks, robots are rapidly starting to become part of our everyday lives. The potential of these systems is far-reaching; from co-worker robots that operate and collaborate with humans side-by-side to robotic tutors in schools that interact with humans in a shared environment. All of these scenarios require systems that are able to act and react in a social way. Evidence suggests that robots should leverage channels of communication that humans understand—despite differences in physical form and capabilities. We have developed Furhat—a social robot that is able to convey several important aspects of human face-to-face interaction such as visual speech, facial expression, and eye gaze by means of facial animation that is retro-projected on a physical mask. In this presentation, we cover a series of experiments attempting to quantize the effect of our social robot and how it compares to other interaction modalities. It is shown that a number of functions ranging from low-level audio-visual speech perception to vocabulary learning improve when compared to unimodal (e.g., audio-only) settings or 2D virtual avatars.

  • 305.
    Bettelani, Gemma Carolina
    et al.
    Research Center E. Piaggio, Dept. Information Engineering of University of Pisa, Pisa, Italy.
    Gabellieri, Chiara
    Research Center E. Piaggio, Dept. Information Engineering of University of Pisa, Pisa, Italy.
    Mengacci, Riccardo
    Research Center E. Piaggio, Dept. Information Engineering of University of Pisa, Pisa, Italy.
    Massa, Federico
    Research Center E. Piaggio, Dept. Information Engineering of University of Pisa, Pisa, Italy.
    Mannucci, Anna
    Örebro University, School of Science and Technology.
    Pallottino, Lucia
    Research Center E. Piaggio, Dept. Information Engineering of University of Pisa, Pisa, Italy.
    Robotics Laboratory within the Italian School-Work Transition Program in High Schools: A Case Study2021Conference paper (Refereed)
    Abstract [en]

    This paper presents a robotics laboratory originated by the collaboration between the university and high school within the Italian school-work transition program. The educational objective of the proposed lab is twofold: 1) ease the transfer of robotic researchers’ expertise into useful means for the students’ learning; 2) teaching by practice the multidisciplinarity of robotics. We exploited the RoboCup Junior Race as a useful scenario to cover topics from 3D printing for fast prototyping to low-level and high-level controller design. An ad-hoc end-of-term student survey confirms the effectiveness of the approach. Finally, the paper includes some considerations on how general problems in the robotic and scientific community, such as gender issues and COVID-19 restrictions, can impact the educational robotics activities.

    Download full text (pdf)
    Robotics Laboratory within the Italian School-Work Transition Program in High Schools: A Case Study
  • 306.
    Bevilacqua, Fernando
    et al.
    Federal University of Fronteira Sul, Chapecó, Brazil.
    Backlund, Per
    University of Skövde, School of Informatics. University of Skövde, The Informatics Research Centre.
    Engström, Henrik
    University of Skövde, School of Informatics. University of Skövde, The Informatics Research Centre.
    Proposal for Non-contact Analysis of Multimodal Inputs to Measure Stress Level in Serious Games2015In: VS-Games 2015: 7th International Conference on Games and Virtual Worlds for Serious Applications / [ed] Per Backlund; Henrik Engström; Fotis Liarokapis, Red Hook, NY: IEEE Computer Society, 2015, p. 171-174Conference paper (Refereed)
    Abstract [en]

    The process of monitoring user emotions in serious games or human-computer interaction is usually obtrusive. The work-flow is typically based on sensors that are physically attached to the user. Sometimes those sensors completely disturb the user experience, such as finger sensors that prevent the use of keyboard/mouse. This short paper presents techniques used to remotely measure different signals produced by a person, e.g. heart rate, through the use of a camera and computer vision techniques. The analysis of a combination of such signals (multimodal input) can be used in a variety of applications such as emotion assessment and measurement of cognitive stress. We present a research proposal for measurement of player’s stress level based on a non-contact analysis of multimodal user inputs. Our main contribution is a survey of commonly used methods to remotely measure user input signals related to stress assessment.

  • 307.
    Bešenić, Krešimir
    et al.
    Faculty of Electrical Engineering and Computing, University of Zagreb,.
    Ahlberg, Jörgen
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Pandžić, Igor
    Faculty of Electrical Engineering and Computing, University of Zagreb,.
    Let Me Take a Better Look: Towards Video-Based Age Estimation2024In: Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods - ICPRAM, Rome , Italy, 2024, p. 57-59Conference paper (Refereed)
    Abstract [en]

    Taking a better look at subjects of interest helps humans to improve confidence in their age estimation. Unlike still images, sequences offer spatio-temporal dynamic information that contains many cues related to age progression. A review of previous work on video-based age estimation indicates that this is an underexplored field of research. This may be caused by a lack of well-defined and publicly accessible video benchmark protocol, as well as the absence of video-oriented training data. To address the former issue, we propose a carefully designed video age estimation benchmark protocol and make it publicly available. To address the latter issue, we design a video-specific age estimation method that leverages pseudo-labeling and semi-supervised learning. Our results show that the proposed method outperforms image-based baselines on both offline and online benchmark protocols, while the online estimation stability is improved by more than 50%.

  • 308.
    Bešenić, Krešimir
    et al.
    Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia.
    Ahlberg, Jörgen
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Pandžić, Igor
    Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia.
    Unsupervised Facial Biometric Data Filtering for Age and Gender Estimation2019In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP 2019), SciTePress, 2019, Vol. 5, p. 209-217Conference paper (Refereed)
    Abstract [en]

    Availability of large training datasets was essential for the recent advancement and success of deep learning methods. Due to the difficulties related to biometric data collection, datasets with age and gender annotations are scarce and usually limited in terms of size and sample diversity. Web-scraping approaches for automatic data collection can produce large amounts weakly labeled noisy data. The unsupervised facial biometric data filtering method presented in this paper greatly reduces label noise levels in web-scraped facial biometric data. Experiments on two large state-of-the-art web-scraped facial datasets demonstrate the effectiveness of the proposed method, with respect to training and validation scores, training convergence, and generalization capabilities of trained age and gender estimators.

    Download full text (pdf)
    fulltext
  • 309.
    Bešenić, Krešimir
    et al.
    Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia.
    Ahlberg, Jörgen
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Pandžić, Igor S.
    Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia.
    Picking out the bad apples: unsupervised biometric data filtering for refined age estimation2023In: The Visual Computer, ISSN 0178-2789, E-ISSN 1432-2315, Vol. 39, p. 219-237Article in journal (Refereed)
    Abstract [en]

    Introduction of large training datasets was essential for the recent advancement and success of deep learning methods. Due to the difficulties related to biometric data collection, facial image datasets with biometric trait labels are scarce and usually limited in terms of size and sample diversity. Web-scraping approaches for automatic data collection can produce large amounts of weakly labeled and noisy data. This work is focused on picking out the bad apples from web-scraped facial datasets by automatically removing erroneous samples that impair their usability. The unsupervised facial biometric data filtering method presented in this work greatly reduces label noise levels in web-scraped facial biometric data. Experiments on two large state-of-the-art web-scraped datasets demonstrate the effectiveness of the proposed method with respect to real and apparent age estimation based on five different age estimation methods. Furthermore, we apply the proposed method, together with a newly devised strategy for merging multiple datasets, to data collected from three major web-based data sources (i.e., IMDb, Wikipedia, Google) and derive the new Biometrically Filtered Famous Figure Dataset or B3FD. The proposed dataset, which is made publicly available, enables considerable performance gains for all tested age estimation methods and age estimation tasks. This work highlights the importance of training data quality compared to data quantity and selection of the estimation method.

    Download full text (pdf)
    fulltext
  • 310.
    Bhat, Goutam
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Danelljan, Martin
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Incept Inst Artificial Intelligence, U Arab Emirates.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Combining Local and Global Models for Robust Re-detection2018In: Proceedings of AVSS 2018. 2018 IEEE International Conference on Advanced Video and Signal-based Surveillance, Auckland, New Zealand, 27-30 November 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 25-30Conference paper (Refereed)
    Abstract [en]

    Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual tracking. However, these methods still struggle in occlusion and out-of-view scenarios due to the absence of a re-detection component. While such a component requires global knowledge of the scene to ensure robust re-detection of the target, the standard DCF is only trained on the local target neighborhood. In this paper, we augment the state-of-the-art DCF tracking framework with a re-detection component based on a global appearance model. First, we introduce a tracking confidence measure to detect target loss. Next, we propose a hard negative mining strategy to extract background distractors samples, used for training the global model. Finally, we propose a robust re-detection strategy that combines the global and local appearance model predictions. We perform comprehensive experiments on the challenging UAV123 and LTB35 datasets. Our approach shows consistent improvements over the baseline tracker, setting a new state-of-the-art on both datasets.

    Download full text (pdf)
    Combining Local and Global Models for Robust Re-detection
  • 311.
    Bhat, Goutam
    et al.
    Swiss Fed Inst Technol, Switzerland.
    Danelljan, Martin
    Swiss Fed Inst Technol, Switzerland.
    Timofte, Radu
    Swiss Fed Inst Technol, Switzerland; Julius Maximilian Univ Wurzburg, Germany.
    Cao, Yizhen
    Commun Univ China, Peoples R China.
    Cao, Yuntian
    Commun Univ China, Peoples R China.
    Chen, Meiya
    Xiaomi, Peoples R China.
    Chen, Xihao
    USTC, Peoples R China; Univ Sci & Technol China, Peoples R China.
    Cheng, Shen
    Megvii Technol, Peoples R China.
    Dudhane, Akshay
    Mohamed Bin Zayed Univ AI MBZUAI, U Arab Emirates.
    Fan, Haoqiang
    Megvii Technol, Peoples R China.
    Gang, Ruipeng
    UHDTV Res & Applicat Lab, Peoples R China.
    Gao, Jian
    SRC B, Peoples R China.
    Gu, Yan
    UESTC, Peoples R China.
    Huang, Jie
    USTC, Peoples R China; Univ Sci & Technol China, Peoples R China.
    Huang, Liufeng
    South China Univ Technol, Peoples R China.
    Jo, Youngsu
    Sogang Univ, South Korea.
    Kang, Sukju
    Sogang Univ, South Korea.
    Khan, Salman
    MBZUAI, U Arab Emirates; Australian Natl Univ ANU, Australia.
    Khan, Fahad
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. MBZUAI, U Arab Emirates.
    Kondo, Yuki
    Toyota Technol Inst TTI, Japan.
    Li, Chenghua
    Chinese Acad Sci, Peoples R China.
    Li, Fangya
    Commun Univ China, Peoples R China.
    Li, Jinjing
    Commun Univ China, Peoples R China.
    Li, Youwei
    Megvii Technol, Peoples R China.
    Li, Zechao
    Nanjing Univ Sci & Technol, Peoples R China.
    Liu, Chenming
    UHDTV Research and Application Laboratory.
    Liu, Shuaicheng
    Megvii Technol, Peoples R China; Univ Elect Sci & Technol China UESTC, Peoples R China.
    Liu, Zikun
    SRC B, Peoples R China.
    Liu, Zhuoming
    South China Univ Technol, Peoples R China.
    Luo, Ziwei
    Megvii Technol, Peoples R China.
    Luo, Zhengxiong
    CASIA, Peoples R China.
    Mehta, Nancy
    Indian Inst Technol Ropar IIT Ropar, India.
    Murala, Subrahmanyam
    Indian Inst Technol Ropar IIT Ropar, India.
    Nam, Yoonchan
    Sogang Univ, South Korea.
    Nakatani, Chihiro
    Toyota Technol Inst TTI, Japan.
    Ostyakov, Pavel
    Huawei, Peoples R China.
    Pan, Jinshan
    Nanjing Univ Sci & Technol, Peoples R China.
    Song, Ge
    USTC, Peoples R China.
    Sun, Jian
    Megvii Technol, Peoples R China.
    Sun, Long
    Nanjing Univ Sci & Technol, Peoples R China.
    Tang, Jinhui
    Nanjing Univ Sci & Technol, Peoples R China.
    Ukita, Norimichi
    Toyota Technol Inst TTI, Japan.
    Wen, Zhihong
    Megvii Technol, Peoples R China.
    Wu, Qi
    Megvii Technol, Peoples R China.
    Wu, Xiaohe
    Harbin Inst Technol, Peoples R China.
    Xiao, Zeyu
    USTC, Peoples R China; Univ Sci & Technol China, Peoples R China.
    Xiong, Zhiwei
    USTC, Peoples R China; Univ Sci & Technol China, Peoples R China.
    Xu, Rongjian
    Harbin Inst Technol, Peoples R China.
    Xu, Ruikang
    USTC, Peoples R China; Univ Sci & Technol China, Peoples R China.
    Yan, Youliang
    Huawei, Peoples R China.
    Yang, Jialin
    WHU, Peoples R China.
    Yang, Wentao
    South China Univ Technol, Peoples R China.
    Yang, Zhongbao
    Nanjing Univ Sci & Technol, Peoples R China.
    Yasue, Fuma
    Toyota Technol Inst TTI, Japan.
    Yao, Mingde
    USTC, Peoples R China; Univ Sci & Technol China, Peoples R China.
    Yu, Lei
    Megvii Technol, Peoples R China.
    Zhang, Cong
    Xiaomi, Peoples R China.
    Zamir, Syed Waqas
    Incept Inst Artificial Intelligence IIAI, U Arab Emirates.
    Zhang, Jianxing
    SRC B, Peoples R China.
    Zhang, Shuohao
    Harbin Inst Technol, Peoples R China.
    Zhang, Zhilu
    Harbin Inst Technol, Peoples R China.
    Zheng, Qian
    Commun Univ China, Peoples R China.
    Zhou, Gaofeng
    Xiaomi, Peoples R China.
    Zhussip, Magauiya
    Huawei, Peoples R China.
    Zou, Xueyi
    Huawei, Peoples R China.
    Zuo, Wangmeng
    Harbin Inst Technol, Peoples R China.
    NTIRE 2022 Burst Super-Resolution Challenge2022In: 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2022), IEEE , 2022, p. 1040-1060Conference paper (Refereed)
    Abstract [en]

    Burst super-resolution has received increased attention in recent years due to its applications in mobile photography. By merging information from multiple shifted images of a scene, burst super-resolution aims to recover details which otherwise cannot be obtained using a simple input image. This paper reviews the NTIRE 2022 challenge on burst super-resolution. In the challenge, the participants were tasked with generating a clean RGB image with 4x higher resolution, given a RAW noisy burst as input. That is, the methods need to perform joint denoising, demosaicking, and super-resolution. The challenge consisted of 2 tracks. Track 1 employed synthetic data, where pixel-accurate high-resolution ground truths are available. Track 2 on the other hand used real-world bursts captured from a handheld camera, along with approximately aligned reference images captured using a DSLR. 14 teams participated in the final testing phase. The top performing methods establish a new state-of-the-art on the burst super-resolution task.

  • 312.
    Bhat, Goutam
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Johnander, Joakim
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Danelljan, Martin
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Khan, Fahad Shahbaz
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    Unveiling the power of deep tracking2018In: Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part II / [ed] Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu and Yair Weiss, Cham: Springer Publishing Company, 2018, p. 493-509Conference paper (Refereed)
    Abstract [en]

    In the field of generic object tracking numerous attempts have been made to exploit deep features. Despite all expectations, deep trackers are yet to reach an outstanding level of performance compared to methods solely based on handcrafted features. In this paper, we investigate this key issue and propose an approach to unlock the true potential of deep features for tracking. We systematically study the characteristics of both deep and shallow features, and their relation to tracking accuracy and robustness. We identify the limited data and low spatial resolution as the main challenges, and propose strategies to counter these issues when integrating deep features for tracking. Furthermore, we propose a novel adaptive fusion approach that leverages the complementary properties of deep and shallow features to improve both robustness and accuracy. Extensive experiments are performed on four challenging datasets. On VOT2017, our approach significantly outperforms the top performing tracker from the challenge with a relative gain of >17% in EAO.

    Download full text (pdf)
    Unveiling the power of deep tracking
  • 313.
    Bhatt, Dulari
    et al.
    Parul University, India.
    Patel, Chirag
    DEPSTAR, India.
    Talsania, Hardik
    Parul University, India.
    Patel, Jigar
    Parul University, India.
    Vaghela, Rasmika
    Parul University, India.
    Pandya, Sharnil
    Symbiosis Int Deemed University, India.
    Modi, Kirit
    Sankalchand Patel University, India.
    Ghayvat, Hemant
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope2021In: Electronics, E-ISSN 2079-9292, Vol. 10, no 20, article id 2470Article, review/survey (Refereed)
    Abstract [en]

    Computer vision is becoming an increasingly trendy word in the area of image processing. With the emergence of computer vision applications, there is a significant demand to recognize objects automatically. Deep CNN (convolution neural network) has benefited the computer vision community by producing excellent results in video processing, object recognition, picture classification and segmentation, natural language processing, speech recognition, and many other fields. Furthermore, the introduction of large amounts of data and readily available hardware has opened new avenues for CNN study. Several inspirational concepts for the progress of CNN have been investigated, including alternative activation functions, regularization, parameter optimization, and architectural advances. Furthermore, achieving innovations in architecture results in a tremendous enhancement in the capacity of the deep CNN. Significant emphasis has been given to leveraging channel and spatial information, with a depth of architecture and information processing via multi-path. This survey paper focuses mainly on the primary taxonomy and newly released deep CNN architectures, and it divides numerous recent developments in CNN architectures into eight groups. Spatial exploitation, multi-path, depth, breadth, dimension, channel boosting, feature-map exploitation, and attention-based CNN are the eight categories. The main contribution of this manuscript is in comparing various architectural evolutions in CNN by its architectural change, strengths, and weaknesses. Besides, it also includes an explanation of the CNN's components, the strengths and weaknesses of various CNN variants, research gap or open challenges, CNN applications, and the future research direction.</p>

  • 314.
    Bhatt, Mehul
    Örebro University, School of Science and Technology.
    Visuospatial Commonsense: On Neurosymbolic Reasoning and Learning about Space and Motion2022In: Spatio-Temporal Reasoning and Learning 2022: Proceedings of the 1st International Workshop on Spatio-Temporal Reasoning and Learning (STRL 2022) co-located with the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI 2022, ECAI 2022), Vienna, Austria, July 24, 2022 / [ed] Michael Sioutis; Zhiguo Long; John Stell; Jochen Renz, Technical University of Aachen , 2022, Vol. 3190Conference paper (Refereed)
  • 315.
    Bhatt, Mehul
    et al.
    SFB/TR 8 Spatial Cognition, University of Bremen, Bremen, Germany.
    Dylla, Frank
    SFB/TR 8 Spatial Cognition, University of Bremen, Bremen, Germany.
    A Qualitative Model of Dynamic Scene Analysis and Interpretation in Ambient Intelligence Systems2009In: International Journal of Robotics and Automation, ISSN 0826-8185, Vol. 24, no 3, p. 235-244Article in journal (Refereed)
    Abstract [en]

    Ambient intelligence environments necessitate representing and reasoning about dynamic spatial scenes and configurations. The ability to perform predictive and explanatory analyses of spatial scenes is crucial towards serving a useful intelligent function within such environments. We present a formal qualitative model that combines existing qualitative theories about space with it formal logic-based calculus suited to modelling dynamic environments, or reasoning about action and change in general. With this approach, it is possible to represent and reason about arbitrary dynamic spatial environments within a unified framework. We clarify and elaborate on our ideas with examples grounded in a smart environment.

  • 316.
    Bhatt, Mehul
    et al.
    Department of Computer Science, La Trobe University, Germany.
    Loke, Seng
    Department of Computer Science, La Trobe University, Germany.
    Modelling Dynamic Spatial Systems in the Situation Calculus2008In: Spatial Cognition and Computation, ISSN 1387-5868, E-ISSN 1573-9252, Vol. 8, no 1-2, p. 86-130Article in journal (Refereed)
    Abstract [en]

    We propose and systematically formalise a dynamical spatial systems approach for the modelling of changing spatial environments. The formalisation adheres to the semantics of the situation calculus and includes a systematic account of key aspects that are necessary to realize a domain-independent qualitative spatial theory that may be utilised across diverse application domains. The spatial theory is primarily derivable from the all-pervasive generic notion of "qualitative spatial calculi" that are representative of differing aspects of space. In addition, the theory also includes aspects, both ontological and phenomenal in nature, that are considered inherent in dynamic spatial systems. Foundational to the formalisation is a causal theory that adheres to the representational and computational semantics of the situation calculus. This foundational theory provides the necessary (general) mechanism required to represent and reason about changing spatial environments and also includes an account of the key fundamental epistemological issues concerning the frame and the ramification problems that arise whilst modelling change within such domains. The main advantage of the proposed approach is that based on the structure and semantics of the proposed framework, fundamental reasoning tasks such as projection and explanation directly follow. Within the specialised spatial reasoning domain, these translate to spatial planning/re-configuration, causal explanation and spatial simulation. Our approach is based on the hypothesis that alternate formalisations of existing qualitative spatial calculi using high-level tools such as the situation calculus are essential for their utilisation in diverse application domains such as intelligent systems, cognitive robotics and event-based GIS.

  • 317.
    Bhatt, Mehul
    et al.
    Örebro University, School of Science and Technology.
    Suchan, Jakob
    University of Bremen, Bremen, Germany.
    Cognitive Vision and Perception: Deep Semantics Integrating AI and Vision for Reasoning about Space, Motion, and Interaction2020In: ECAI 2020 / [ed] Giuseppe De Giacomo; Alejandro Catala; Bistra Dilkina; Michela Milano; Senén Barro; Alberto Bugarín; Jérôme Lang, IOS Press , 2020, Vol. 325, p. 2881-2882Conference paper (Refereed)
    Abstract [en]

    Semantic interpretation of dynamic visuospatial imagery calls for a general and systematic integration of methods in knowledge representation and computer vision. Towards this, we highlight research articulating & developing deep semantics, characterised by the existence of declarative models –e.g., pertaining space and motion– and corresponding formalisation and reasoning methods sup- porting capabilities such as semantic question-answering, relational visuospatial learning, and (non-monotonic) visuospatial explanation. We position a working model for deep semantics by highlighting select recent / closely related works from IJCAI, AAAI, ILP, and ACS. We posit that human-centred, explainable visual sensemaking necessitates both high-level semantics and low-level visual computing, with the highlighted works providing a model for systematic, modular integration of diverse multifaceted techniques developed in AI, ML, and Computer Vision.

  • 318.
    Bhatt, Mehul
    et al.
    Örebro University, School of Science and Technology.
    Suchan, Jakob
    University of Bremen, Bremen, Germany.
    Vardarajan, Srikrishna
    CoDesign Lab EU.
    Deep Semantics for Explainable Visuospatial Intelligence: Perspectives on Integrating Commonsense Spatial Abstractions and Low-Level Neural Features2019In: Proceedings of the 2019 International Workshop on Neural-Symbolic Learning and Reasoning: Annual workshop of the Neural-Symbolic Learning and Reasoning Association / [ed] Derek Doran; Artur d'Avila Garcez; Freddy Lecue, 2019Conference paper (Refereed)
    Abstract [en]

    High-level semantic interpretation of (dynamic) visual imagery calls for general and systematic methods integrating techniques in knowledge representation and computer vision. Towards this, we position "deep semantics", denoting the existence of declarative models –e.g., pertaining "space and motion"– and corresponding formalisation and methods supporting (domain-independent) explainability capabilities such as semantic question-answering, relational (and relationally-driven) visuospatial learning, and (non-monotonic) visuospatial abduction. Rooted in recent work, we summarise and report the status quo on deep visuospatial semantics —and our approach to neurosymbolic integration and explainable visuo-spatial computing in that context— with developed methods and tools in diverse settings such as behavioural research in psychology, art & social sciences, and autonomous driving.

  • 319.
    Bhattacharyya, Subhajit
    et al.
    ECE Department, Mallabhum Institute of Technology, West Bengal, India.
    Chakraborty, Subham
    CSE Department, Mallabhum Institute of Technology, West Bengal, India.
    Reconstruction of Human Faces from Its Eigenfaces2014In: International Journal of Advanced Research In Computer Science and Software Engineering, ISSN 2277-6451, E-ISSN 2277-128X, Vol. 4, no 1, p. 209-215Article in journal (Refereed)
    Abstract [en]

    Eigenface or Principal Components Analysis (PCA) methods have demonstrated their success in face recognition, detection and tracking. In this paper we have used this concept to reconstruct or represent a face as a linear combination of a set of basis images. The basis images are nothing but the eigenfaces. The idea is similar to represent a signal in the form of a linear combination of complex sinusoids called the Fourier Series. The main advantage is that the number of eigenfaces required is less than the number of face images in the database. Selection of number of eigefaces is important here. Here we investigate what is the number of minimum eigenface that is required for faithful production of a face image.

    Download full text (pdf)
    Reconstruction of Human Faces from Its Eigenfaces
  • 320.
    Bhunia, Ankan Kumar
    et al.
    Mohamed bin Zayed Univ AI, U Arab Emirates.
    Khan, Salman
    Mohamed bin Zayed Univ AI, U Arab Emirates; Australian Natl Univ, Australia.
    Cholakkal, Hisham
    Mohamed bin Zayed Univ AI, U Arab Emirates.
    Anwer, Rao Muhammad
    Mohamed bin Zayed Univ AI, U Arab Emirates; Aalto Univ, Finland.
    Khan, Fahad
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Mohamed bin Zayed Univ AI, U Arab Emirates.
    Laaksonen, Jorma
    Aalto Univ, Finland.
    Felsberg, Michael
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.
    DoodleFormer: Creative Sketch Drawing with Transformers2022In: COMPUTER VISION - ECCV 2022, PT XVII, SPRINGER INTERNATIONAL PUBLISHING AG , 2022, Vol. 13677, p. 338-355Conference paper (Refereed)
    Abstract [en]

    Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. We introduce graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder that explicitly models the variations of each sketch body part to be drawn. Experiments are performed on two creative sketch datasets: Creative Birds and Creative Creatures. Our qualitative, quantitative and human-based evaluations show that DoodleFormer outperforms the state-of-the-art on both datasets, yielding realistic and diverse creative sketches. On Creative Creatures, DoodleFormer achieves an absolute gain of 25 in Frechet inception distance (FID) over state-of-the-art. We also demonstrate the effectiveness of DoodleFormer for related applications of text to creative sketch generation, sketch completion and house layout generation. Code is available at: https://github.com/ ankanbhunia/doodleformer.

  • 321.
    Bhunia, Ankan Kumar
    et al.
    Mohamed bin Zayed Univ AI, U Arab Emirates.
    Khan, Salman
    Mohamed bin Zayed Univ AI, U Arab Emirates; Australian Natl Univ, Australia.
    Cholakkal, Hisham
    Mohamed bin Zayed Univ AI, U Arab Emirates.
    Anwer, Rao Muhammad
    Mohamed bin Zayed Univ AI, U Arab Emirates.
    Khan, Fahad
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Mohamed bin Zayed Univ AI, U Arab Emirates.
    Shah, Mubarak
    Univ Cent Florida, FL 32816 USA.
    Handwriting Transformers2021In: 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), IEEE , 2021, p. 1066-1074Conference paper (Refereed)
    Abstract [en]

    We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local style patterns. The proposed HWT captures the long and short range relationships within the style examples through a self-attention mechanism, thereby encoding both global and local style patterns. Further, the proposed transformer-based HWT comprises an encoder-decoder attention that enables style-content entanglement by gathering the style features of each query character. To the best of our knowledge, we are the first to introduce a transformer-based network for styled handwritten text generation. Our proposed HWT generates realistic styled handwritten text images and outperforms the state-of-the-art demonstrated through extensive qualitative, quantitative and human-based evaluations. The proposed HWT can handle arbitrary length of text and any desired writing style in a few-shot setting. Further, our HWT generalizes well to the challenging scenario where both words and writing style are unseen during training, generating realistic styled handwritten text images. Code is available at: https://github.com/ankanbhunia/HandwritingTransformers

  • 322.
    Bhunia, Ankan Kumar
    et al.
    Mohamed bin Zayed Univ AI, U Arab Emirates.
    Khan, Salman
    Mohamed bin Zayed Univ AI, U Arab Emirates; Australian Natl Univ, Australia.
    Cholakkal, Hisham
    Mohamed bin Zayed Univ AI, U Arab Emirates.
    Anwer, Rao Muhammad
    Mohamed bin Zayed Univ AI, U Arab Emirates; Aalto Univ, Finland.
    Laaksonen, Jorma
    Aalto Univ, Finland.
    Shah, Mubarak
    Univ Cent Florida, FL USA.
    Khan, Fahad
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Mohamed bin Zayed Univ AI, U Arab Emirates.
    Person Image Synthesis via Denoising Diffusion Model2023In: 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, IEEE COMPUTER SOC , 2023, p. 5968-5976Conference paper (Refereed)
    Abstract [en]

    The pose-guided person image generation task requires synthesizing photorealistic images of humans in arbitrary poses. The existing approaches use generative adversarial networks that do not necessarily maintain realistic textures or need dense correspondences that struggle to handle complex deformations and severe occlusions. In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution. Our proposed Person Image Diffusion Model (PIDM) disintegrates the complex transfer problem into a series of simpler forward-backward denoising steps. This helps in learning plausible source-to-target transformation trajectories that result in faithful textures and undistorted appearance details. We introduce a texture diffusion module based on cross-attention to accurately model the correspondences between appearance and pose information available in source and target images. Further, we propose disentangled classifier-free guidance to ensure close resemblance between the conditional inputs and the synthesized output in terms of both pose and appearance information. Our extensive results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios. We also show how our generated images can help in downstream tasks. Code is available at https://github.com/ankanbhunia/PIDM.

  • 323. Bi, Yin
    et al.
    Lv, Mingsong
    Wei, Yangjie
    Guan, Nan
    Yi, Wang
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Multi-feature fusion for thermal face recognition2016In: Infrared physics & technology, ISSN 1350-4495, E-ISSN 1879-0275, Vol. 77, p. 366-374Article in journal (Refereed)
  • 324.
    Biedermann, Daniel
    et al.
    Goethe University, Germany.
    Ochs, Matthias
    Goethe University, Germany.
    Mester, Rudolf
    Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Goethe University, Germany.
    Evaluating visual ADAS components on the COnGRATS dataset2016In: 2016 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), IEEE , 2016, p. 986-991Conference paper (Refereed)
    Abstract [en]

    We present a framework that supports the development and evaluation of vision algorithms in the context of driver assistance applications and traffic surveillance. This framework allows the creation of highly realistic image sequences featuring traffic scenarios. The sequences are created with a realistic state of the art vehicle physics model; different kinds of environments are featured, thus providing a wide range of testing scenarios. Due to the physically-based rendering technique and variable camera models employed for the image rendering process, we can simulate different sensor setups and provide appropriate and fully accurate ground truth data.

  • 325.
    Bigun, Josef
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Intelligent systems (IS-lab).
    Fronthaler, Hartwig
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS).
    Kollreider, Klaus
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS).
    Assuring liveness in biometric identity authentication by real-time face tracking2004In: CIHSPS 2004: proceedings of the 2004 IEEE International Conference on Computational Intelligence for Homeland Security and Personal Safety : S. Giuliano, Venice, Italy, 21-22 July 2004 / [ed] IEEE, Piscataway, N.J.: IEEE Press, 2004, p. 104-111Conference paper (Refereed)
    Abstract [en]

    A system that combines real-time face tracking as well as the localization of facial landmarks in order to improve the authenticity of fingerprint recognition is introduced. The intended purpose of this application is to assist in securing public areas and individuals, in addition to enforce that the collected sensor data in a multi modal person authentication system originate front present persons, i.e. the system is not under a so called play back attack. Facial features are extracted with the help of Gabor filters and classified by SVM experts. For real-time performance, selected points from a retinotopic grid are used to form regional face models. Additionally only a subset of the Gabor decomposition is used for different face regions. The second modality presented is texture-based fingerprint recognition, exploiting linear symmetry. Experimental results on the proposed system are presented.

  • 326.
    Bigun, Josef
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS).
    Gustavsson, TomasChalmers University of Technology, Department of Signals and Systems, Gothenburg, Sweden.
    Image analysis: 13th Scandinavian Conference, SCIA 2003, Halmstad, Sweden, June 29-July 2, 2003, Proceedings2003Conference proceedings (editor) (Other academic)
    Abstract [en]

    This book constitutes the refeered proceedings of the 13th Scandinavian Conference on Image Analysis, SCIA 2003, held in Halmstad, Sweden in June/July 2003.The 148 revised full papers presented together with 6 invited contributions were carefully reviewed and selected for presentation. The papers are organized in topical sections on feature extraction, depth and surface, shape analysis, coding and representation, motion analysis, medical image processing, color analysis, texture analysis, indexing and categorization, and segmentation and spatial grouping.

  • 327.
    Bigun, Josef
    et al.
    Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS).
    Malmqvist, KerstinHalmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS).
    Proceedings: Symposium on image analysis, Halmstad March 7-8, 20002000Conference proceedings (editor) (Other academic)
  • 328.
    Bigun, Josef
    et al.
    Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS).
    Verikas, AntanasHalmstad University, School of Information Science, Computer and Electrical Engineering (IDE), Halmstad Embedded and Intelligent Systems Research (EIS), Intelligent Systems´ laboratory.
    Proceedings SSBA '09: Symposium on Image Analysis, Halmstad University, Halmstad, March 18-20, 20092009Conference proceedings (editor) (Other academic)
  • 329.
    Bilal Akhtar, Muhammad
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    The Use of a Convolutional Neural Network in Detecting Soldering Faults from a Printed Circuit Board Assembly2022In: HighTech and Innovation Journal, ISSN 2723-9535, Vol. 3, no 1, p. 1-14Article in journal (Refereed)
    Abstract [en]

    Automatic Optical Inspection (AOI) is any method of detecting defects during a Printed Circuit Board (PCB) manufacturing process. Early AOI methods were based on classic image processing algorithms using a reference PCB. The traditional methods require very complex and inflexible preprocessing stages. With recent advances in the field of deep learning, especially Convolutional Neural Networks (CNN), automating various computer vision tasks has been established. Limited research has been carried out in the past on using CNN for AOI. The present systems are inflexible and require a lot of preprocessing steps or a complex illumination system to improve the accuracy. This paper studies the effectiveness of using CNN to detect soldering bridge faults in a PCB assembly. The paper presents a method for designing an optimized CNN architecture to detect soldering faults in a PCBA. The proposed CNN architecture is compared with the state-of-the-art object detection architecture, namely YOLO, with respect to detection accuracy, processing time, and memory requirement. The results of our experiments show that the proposed CNN architecture has a 3.0% better average precision, has 50% less number of parameters and infers in half the time as YOLO. The experimental results prove the effectiveness of using CNN in AOI by using images of a PCB assembly without any reference image, any complex preprocessing stage, or a complex illumination system. 

  • 330.
    Billah, Mohammad Ehtasham
    et al.
    School of Business, Örebro University, Örebro, Sweden.
    Javed, Farrukh
    Örebro University, Örebro University School of Business.
    Bayesian Convolutional Neural Network-based Models for Diagnosis of Blood Cancer2022In: Applied Artificial Intelligence, ISSN 0883-9514, E-ISSN 1087-6545, Vol. 36, no 1Article in journal (Refereed)
    Abstract [en]

    Deep learning methods allow computational models involving multiple processing layers to discover intricate structures in data sets. Classifying an image is one such problem where these methods are found to be very useful. Although different approaches have been proposed in the literature, this paper illustrates a successful implementation of the Bayesian Convolution Neural Networks (BCNN)-based classification procedure to classify microscopic images of blood samples (lymphocyte cells) without involving manual feature extractions. The data set contains 260 microscopic images of cancerous and noncancerous lymphocyte cells. We experiment with different network structures and obtain the model that returns the lowest error rate in classifying the images. Our developed models not only produce high accuracy in classifying cancerous and noncancerous lymphocyte cells but also provide useful information regarding uncertainty in predictions.

  • 331.
    Billing, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Cognition Rehearsed: Recognition and Reproduction of Demonstrated Behavior2012Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The work presented in this dissertation investigates techniques for robot Learning from Demonstration (LFD). LFD is a well established approach where the robot is to learn from a set of demonstrations. The dissertation focuses on LFD where a human teacher demonstrates a behavior by controlling the robot via teleoperation. After demonstration, the robot should be able to reproduce the demonstrated behavior under varying conditions. In particular, the dissertation investigates techniques where previous behavioral knowledge is used as bias for generalization of demonstrations.

    The primary contribution of this work is the development and evaluation of a semi-reactive approach to LFD called Predictive Sequence Learning (PSL). PSL has many interesting properties applied as a learning algorithm for robots. Few assumptions are introduced and little task-specific configuration is needed. PSL can be seen as a variable-order Markov model that progressively builds up the ability to predict or simulate future sensory-motor events, given a history of past events. The knowledge base generated during learning can be used to control the robot, such that the demonstrated behavior is reproduced. The same knowledge base can also be used to recognize an on-going behavior by comparing predicted sensor states with actual observations. Behavior recognition is an important part of LFD, both as a way to communicate with the human user and as a technique that allows the robot to use previous knowledge as parts of new, more complex, controllers.

    In addition to the work on PSL, this dissertation provides a broad discussion on representation, recognition, and learning of robot behavior. LFD-related concepts such as demonstration, repetition, goal, and behavior are defined and analyzed, with focus on how bias is introduced by the use of behavior primitives. This analysis results in a formalism where LFD is described as transitions between information spaces. Assuming that the behavior recognition problem is partly solved, ways to deal with remaining ambiguities in the interpretation of a demonstration are proposed.

    The evaluation of PSL shows that the algorithm can efficiently learn and reproduce simple behaviors. The algorithm is able to generalize to previously unseen situations while maintaining the reactive properties of the system. As the complexity of the demonstrated behavior increases, knowledge of one part of the behavior sometimes interferes with knowledge of another parts. As a result, different situations with similar sensory-motor interactions are sometimes confused and the robot fails to reproduce the behavior.

    One way to handle these issues is to introduce a context layer that can support PSL by providing bias for predictions. Parts of the knowledge base that appear to fit the present context are highlighted, while other parts are inhibited. Which context should be active is continually re-evaluated using behavior recognition. This technique takes inspiration from several neurocomputational models that describe parts of the human brain as a hierarchical prediction system. With behavior recognition active, continually selecting the most suitable context for the present situation, the problem of knowledge interference is significantly reduced and the robot can successfully reproduce also more complex behaviors.

    Download full text (pdf)
    Introduction chapters
  • 332.
    Billing, Erik
    Umeå universitet, Institutionen för datavetenskap.
    Cognition Rehearsed: Recognition and Reproduction of Demonstrated Behavior2012Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The work presented in this dissertation investigates techniques for robot Learning from Demonstration (LFD). LFD is a well established approach where the robot is to learn from a set of demonstrations. The dissertation focuses on LFD where a human teacher demonstrates a behavior by controlling the robot via teleoperation. After demonstration, the robot should be able to reproduce the demonstrated behavior under varying conditions. In particular, the dissertation investigates techniques where previous behavioral knowledge is used as bias for generalization of demonstrations.

    The primary contribution of this work is the development and evaluation of a semi-reactive approach to LFD called Predictive Sequence Learning (PSL). PSL has many interesting properties applied as a learning algorithm for robots. Few assumptions are introduced and little task-specific configuration is needed. PSL can be seen as a variable-order Markov model that progressively builds up the ability to predict or simulate future sensory-motor events, given a history of past events. The knowledge base generated during learning can be used to control the robot, such that the demonstrated behavior is reproduced. The same knowledge base can also be used to recognize an on-going behavior by comparing predicted sensor states with actual observations. Behavior recognition is an important part of LFD, both as a way to communicate with the human user and as a technique that allows the robot to use previous knowledge as parts of new, more complex, controllers.

    In addition to the work on PSL, this dissertation provides a broad discussion on representation, recognition, and learning of robot behavior. LFD-related concepts such as demonstration, repetition, goal, and behavior are defined and analyzed, with focus on how bias is introduced by the use of behavior primitives. This analysis results in a formalism where LFD is described as transitions between information spaces. Assuming that the behavior recognition problem is partly solved, ways to deal with remaining ambiguities in the interpretation of a demonstration are proposed.

    The evaluation of PSL shows that the algorithm can efficiently learn and reproduce simple behaviors. The algorithm is able to generalize to previously unseen situations while maintaining the reactive properties of the system. As the complexity of the demonstrated behavior increases, knowledge of one part of the behavior sometimes interferes with knowledge of another parts. As a result, different situations with similar sensory-motor interactions are sometimes confused and the robot fails to reproduce the behavior.

    One way to handle these issues is to introduce a context layer that can support PSL by providing bias for predictions. Parts of the knowledge base that appear to fit the present context are highlighted, while other parts are inhibited. Which context should be active is continually re-evaluated using behavior recognition. This technique takes inspiration from several neurocomputational models that describe parts of the human brain as a hierarchical prediction system. With behavior recognition active, continually selecting the most suitable context for the present situation, the problem of knowledge interference is significantly reduced and the robot can successfully reproduce also more complex behaviors.

    Download full text (pdf)
    FULLTEXT01
  • 333.
    Billing, Erik
    et al.
    University of Skövde, School of Informatics. University of Skövde, The Informatics Research Centre.
    Balkenius, Christian
    Lund University Cognitive Science, Lund, Sweden.
    Modeling the Interplay between Conditioning and Attention in a Humanoid Robot: Habituation and Attentional Blocking2014In: IEEE ICDL-EPIROB 2014: The Fourth Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, October 13-16, 2014 Palazzo Ducale, Genoa, Italy, IEEE conference proceedings, 2014, p. 41-47Conference paper (Refereed)
    Abstract [en]

    A novel model of role of conditioning in attention is presented and evaluated on a Nao humanoid robot. The model implements conditioning and habituation in interaction with a dynamic neural field where different stimuli compete for activation. The model can be seen as a demonstration of how stimulus-selection and action-selection can be combined and illustrates how positive or negative reinforcement have different effects on attention and action. Attention is directed toward both rewarding and punishing stimuli, but appetitive actions are only directed toward positive stimuli. We present experiments where the model is used to control a Nao robot in a task where it can select between two objects. The model demonstrates some emergent effects also observed in similar experiments with humans and animals, including attentional blocking and latent inhibition.

  • 334.
    Billing, Erik
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Hellström, Thomas
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Janlert, Lars Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Robot learning from demonstration using predictive sequence learning2011In: Robotic systems: applications, control and programming / [ed] Ashish Dutta, Kanpur, India: IN-TECH, 2011, p. 235-250Chapter in book (Refereed)
    Abstract [en]

    In this chapter, the prediction algorithm Predictive Sequence Learning (PSL) is presented and evaluated in a robot Learning from Demonstration (LFD) setting. PSL generates hypotheses from a sequence of sensory-motor events. Generated hypotheses can be used as a semi-reactive controller for robots. PSL has previously been used as a method for LFD, but suffered from combinatorial explosion when applied to data with many dimensions, such as high dimensional sensor and motor data. A new version of PSL, referred to as Fuzzy Predictive Sequence Learning (FPSL), is presented and evaluated in this chapter. FPSL is implemented as a Fuzzy Logic rule base and works on a continuous state space, in contrast to the discrete state space used in the original design of PSL. The evaluation of FPSL shows a significant performance improvement in comparison to the discrete version of the algorithm. Applied to an LFD task in a simulated apartment environment, the robot is able to learn to navigate to a specific location, starting from an unknown position in the apartment.

  • 335.
    Billing, Erik
    et al.
    Department of Computing Science, Umeå University, Sweden.
    Hellström, Thomas
    Department of Computing Science, Umeå University, Sweden.
    Janlert, Lars-Erik
    Department of Computing Science, Umeå University, Sweden.
    Robot learning from demonstration using predictive sequence learning2012In: Robotic systems: applications, control and programming / [ed] Ashish Dutta, Kanpur, India: IN-TECH , 2012, p. 235-250Chapter in book (Refereed)
    Abstract [en]

    In this chapter, the prediction algorithm Predictive Sequence Learning (PSL) is presented and evaluated in a robot Learning from Demonstration (LFD) setting. PSL generates hypotheses from a sequence of sensory-motor events. Generated hypotheses can be used as a semi-reactive controller for robots. PSL has previously been used as a method for LFD, but suffered from combinatorial explosion when applied to data with many dimensions, such as high dimensional sensor and motor data. A new version of PSL, referred to as Fuzzy Predictive Sequence Learning (FPSL), is presented and evaluated in this chapter. FPSL is implemented as a Fuzzy Logic rule base and works on a continuous state space, in contrast to the discrete state space used in the original design of PSL. The evaluation of FPSL shows a significant performance improvement in comparison to the discrete version of the algorithm. Applied to an LFD task in a simulated apartment environment, the robot is able to learn to navigate to a specific location, starting from an unknown position in the apartment.

  • 336. Billing, Erik
    et al.
    Hellström, Thomas
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Janlert, Lars-Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Simultaneous recognition and reproduction of demonstrated behavior2015In: Biologically Inspired Cognitive Architectures, ISSN 2212-683X, Vol. 12, p. 43-53Article in journal (Refereed)
    Abstract [en]

    Predictions of sensory-motor interactions with the world is often referred to as a key component in cognition. We here demonstrate that prediction of sensory-motor events, i.e., relationships between percepts and actions, is sufficient to learn navigation skills for a robot navigating in an apartment environment. In the evaluated application, the simulated Robosoft Kompai robot learns from human demonstrations. The system builds fuzzy rules describing temporal relations between sensory-motor events recorded while a human operator is tele-operating the robot. With this architecture, referred to as Predictive Sequence Learning (PSL), learned associations can be used to control the robot and to predict expected sensor events in response to executed actions. The predictive component of PSL is used in two ways: (1) to identify which behavior that best matches current context and (2) to decide when to learn, i.e., update the confidence of different sensory-motor associations. Using this approach, knowledge interference due to over-fitting of an increasingly complex world model can be avoided. The system can also automatically estimate the confidence in the currently executed behavior and decide when to switch to an alternate behavior. The performance of PSL as a method for learning from demonstration is evaluated with, and without, contextual information. The results indicate that PSL without contextual information can learn and reproduce simple behaviors, but fails when the behavioral repertoire becomes more diverse. When a contextual layer is added, PSL successfully identifies the most suitable behavior in almost all test cases. The robot's ability to reproduce more complex behaviors, with partly overlapping and conflicting information, significantly increases with the use of contextual information. The results support a further development of PSL as a component of a dynamic hierarchical system performing control and predictions on several levels of abstraction.

  • 337.
    Billing, Erik
    et al.
    University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment.
    Rosén, Julia
    University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment.
    Lamb, Maurice
    University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment. University of Skövde, School of Engineering Science. University of Skövde, Virtual Engineering Research Environment.
    Language Models for Human-Robot Interaction2023In: HRI '23: Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, ACM Digital Library, 2023, p. 905-906Conference paper (Refereed)
    Abstract [en]

    Recent advances in large scale language models have significantly changed the landscape of automatic dialogue systems and chatbots. We believe that these models also have a great potential for changing the way we interact with robots. Here, we present the first integration of the OpenAI GPT-3 language model for the Aldebaran Pepper and Nao robots. The present work transforms the text-based API of GPT-3 into an open verbal dialogue with the robots. The system will be presented live during the HRI2023 conference and the source code of this integration is shared with the hope that it will serve the community in designing and evaluating new dialogue systems for robots.

    Download full text (pdf)
    fulltext
  • 338.
    Birgersson, Anna
    et al.
    Linköping University, Department of Electrical Engineering, Computer Vision.
    Hellgren, Klara
    Linköping University, Department of Electrical Engineering, Computer Vision.
    Texture Enhancement in 3D Maps using Generative Adversarial Networks2019Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    In this thesis we investigate the use of GANs for texture enhancement. To achievethis, we have studied if synthetic satellite images generated by GANs will improvethe texture in satellite-based 3D maps.

    We investigate two GANs; SRGAN and pix2pix. SRGAN increases the pixelresolution of the satellite images by generating upsampled images from low resolutionimages. As for pip2pix, the GAN performs image-to-image translation bytranslating a source image to a target image, without changing the pixel resolution.

    We trained the GANs in two different approaches, named SAT-to-AER andSAT-to-AER-3D, where SAT, AER and AER-3D are different datasets provided bythe company Vricon. In the first approach, aerial images were used as groundtruth and in the second approach, rendered images from an aerial-based 3D mapwere used as ground truth.

    The procedure of enhancing the texture in a satellite-based 3D map was dividedin two steps; the generation of synthetic satellite images and the re-texturingof the 3D map. Synthetic satellite images generated by two SRGAN models andone pix2pix model were used for the re-texturing. The best results were presentedusing SRGAN in the SAT-to-AER approach, in where the re-textured 3Dmap had enhanced structures and an increased perceived quality. SRGAN alsopresented a good result in the SAT-to-AER-3D approach, where the re-textured3D map had changed color distribution and the road markers were easier to distinguishfrom the ground. The images generated by the pix2pix model presentedthe worst result. As for the SAT-to-AER approach, even though the syntheticsatellite images generated by pix2pix were somewhat enhanced and containedless noise, they had no significant impact in the re-texturing. In the SAT-to-AER-3D approach, none of the investigated models based on the pix2pix frameworkpresented any successful results.

    We concluded that GANs can be used as a texture enhancer using both aerialimages and images rendered from an aerial-based 3D map as ground truth. Theuse of GANs as a texture enhancer have great potential and have several interestingareas for future works.

    Download full text (pdf)
    fulltext
  • 339.
    Birindwa, Fleury
    Jönköping University, School of Engineering, JTH, Computer Science and Informatics.
    Prestandajämförelse mellan Xception, InceptionV3 och MobileNetV2 för bildklassificering på nätpaneler2020Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    In recent years, deep learning models have been used in almost all areas, from industry to academia, specifically for image classification. However, these models are huge in size, with millions of parameters, making it difficult to distribute to smaller devices with limited resources such as mobile phones. This study addresses lightweight pre-trained models of convolutional neural networks which is state of art in deep learning and their size is suitable as a base model for mobile application development.

    The purpose of this study is to evaluate the performance of Xception, InceptionV3 and MobilNetV2 in order to facilitate selection decisions of a lightweight convolutional networks as base for the development of mobile applications in image classification. In order to achieve their purpose, these models have been implemented using the Transfer Learning method and are designed to distinguish images on mesh panels from the company Troax. The study takes up the method that allows transfer of knowledge from an existing model to a new model, explain how the training process and the test process went, as well as analysis of results.

    Results showed that Xception had 86% accuracy and had 10 minutes processing time on 2000 training images and 1000 test images. Exception’s performance was the best among all these models. The difference between Xception and InceptionV3 was 10% accuracy and 2 minutes process time. Between Xception and MobilNetV2 there was a difference of 23% in accuracy and 3 minutes in process time. Experiments showed that these models performed less well with smaller training images below 800 images. Over 800 images, each model began to perform prediction over 70% accuracy.

    Download full text (pdf)
    fulltext
  • 340.
    Bjervig, Joel
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control.
    Unsupervised Image Classification Using Domain Adaptation: Via the Second Order Statistic2022Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [sv]

    Framgången inom maskininlärning och djupinlärning beror till stor del på stora, annoterade dataset. Att tilldela etiketter till data är väldigt resurskrävande och kan till viss del undvikas genom att utnyttja datans statistiska egenskaper. En maskininlärningsmodell kan lära sig att klassificera bilder från en domän utifrån träningsexempel som innehåller bilder, samt etiketter som berättar vad bilder föreställer. Men vad gör man om datan inte har tilldelade etiketter? En maskininlärningsmodell som lär sig en uppgift utifrån annoterad data från en källdomän, kan med hjälp av information från måldomänen (som inte har tilldelade etiketter), anpassas till att prestera bättre på data från måldomänen. Forskningsområdet som studerar hur man anpassar och generaliserar en modell mellan två olika domäner heter domänanpassning, eller domain adaptation, på engelska.

     

    Detta examensarbete är utfört på Scanias forskningsavdelning för autonom transport och handlar om hur modeller för bildklassificering som tränas på kamerabilder med etiketter, kan anpassas till att få ökad noggrannhet på ett dataset med LiDAR bilder, som inte har etiketter. Två metoder för domänanpassning har jämförts med varandra, samt en model tränad på kameradata genom övervakad inlärning utan domänanpassning. Alla metoder opererar på något vis med ett djupt faltningsnätverk (CNN) där uppgiften är att klassificera bilder utav bilar eller fotgängare. Kovariansen utav datan från käll- och måldomänen är det centrala måttet för domänanpassningsmetoderna i detta projekt. Den första metoden är en så kallad ytlig metod, där själva anpassningsmetoden inte ingår inuti den djupa arkitekturen av modellen, utan är ett mellansteg i processen. Den andra metoden förenar domänanpassningsmetoden med klassificeringen i den djupa arkitekturen. Den tredje modellen består endast utav faltningsnätverket, utan en metod för domänanpassning och används som referens. 

     

    Modellen som tränades på kamerabilderna utan en domänanpassningsmetod klassificerar LiDAR-bilderna med en noggrannhet på 63.80%, samtidigt som den ”ytliga” metoden når en noggrannhet på 74.67% och den djupa metoden presterar bäst med 80.73%. Resultaten visar att det är möjligt att anpassa en modell som tränas på data från källdomänen, till att få ökad klassificeringsnoggrannhet i måldomänen genom att använda kovariansen utav datan från de två domänerna. Den djupa metoden för domänanpassning tillåter även användandet utav andra statistiska mått som kan vara mer framgångsrika i att generalisera modellen, beroende på hur datan är fördelad. Överlägsenheten hos den djupa metoden antyder att domänanpassning med fördel kan bäddas in i den djupa arkitekturen så att modelparametrarna blir uppdaterade för att lära sig en mer robust representation utav måldomänen.

    Download full text (pdf)
    thesis_paper_joel_bjervig
  • 341.
    Bjurling, Oscar
    et al.
    RISE Research Institutes of Sweden.
    Arvola, Mattias
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Arts and Sciences.
    Ziemke, Tom
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering.
    Swarms, teams, or choirs?: Metaphors in multi-UAV systems design2021In: Advances in Human Factors in Robots, Unmanned Systems and Cybersecurity / [ed] Matteo Zallio, Carlos Raymundo Ibañez, Jesus Hechavarria Hernandez, Cham, 2021, p. 10-15Conference paper (Refereed)
    Abstract [en]

    Future Unmanned Aerial Vehicles (UAVs) are projected to fly and operate in swarms. The swarm metaphor makes explicit and implicit mappings regarding system architecture and human interaction to aspects of natural systems, such as bee societies. Compared to the metaphor of a team, swarming agents as individuals are less capable, more expendable, and more limited in terms of communication and coordination. Given their different features and limitations, the two metaphors could be useful in different scenarios. We also discuss a choir metaphor and illustrate how it can give rise to different design concepts. We conclude that designers and engineers should be mindful of the metaphors they use because they influence—and limit—how to think about and design for multi-UAV systems.

    Download full text (pdf)
    fulltext
  • 342.
    Bjurling, Oscar
    et al.
    Digital Systems, RISE Research Institutes of Sweden, Linköping, Sweden.
    Granlund, Rego
    Digital Systems, RISE Research Institutes of Sweden, Linköping, Sweden.
    Alfredson, Jens
    Aeronautics, Saab AB, Linköping, Sweden.
    Arvola, Mattias
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Arts and Sciences.
    Ziemke, Tom
    Linköping University, Department of Computer and Information Science, Human-Centered systems. Linköping University, Faculty of Science & Engineering.
    Drone Swarms in Forest Firefighting: A Local Development Case Study of Multi-Level Human-Swarm Interaction2020In: Proceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society, New York, NY, USA: Association for Computing Machinery (ACM), 2020, article id 93Conference paper (Refereed)
    Abstract [en]

    Swarms of autonomous and coordinating Unmanned Aerial Vehicles (UAVs) are rapidly being developed to enable simultaneous control of multiple UAVs. In the field of Human-Swarm Interaction (HSI), researchers develop and study swarm algorithms and various means of control and evaluate their cognitive and task performance. There is, however, a lack of research describing how UAV swarms will fit into future real-world domain contexts. To remedy this, this paper describes a case study conducted within the community of firefighters, more precisely two Swedish fire departments that regularly deploy UAVs in fire responses. Based on an initial description of how their UAVs are used in a forest firefighting context, participating UAV operators and unit commanders envisioned a scenario that showed how the swarm and its capabilities could be utilized given the constraints and requirements of a forest firefighting mission. Based on this swarm scenario description we developed a swarm interaction model that describes how the operators’ interaction traverses multiple levels ranging from the entire swarm, via subswarms and individual UAVs, to specific sensors and equipment carried by the UAVs. The results suggest that human-in-the-loop simulation studies need to enable interaction across multiple swarm levels as this interaction may exert additional cognitive strain on the human operator.

    Download full text (pdf)
    fulltext
  • 343.
    Björk, Ingrid
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Kavathatzopoulos, Iordanis
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Robots, ethics and language2015In: Computers & Society: The Newsletter of the ACM Special Interest Group on Computers and Society Special Issue on 20 Years of ETHICOMP / [ed] Mark Coeckelbergh, Bernd Stahl, and Catherine Flick; Vaibhav Garg and Dee Weikle, ACM Digital Library, 2015, p. 268-273Conference paper (Refereed)
    Abstract [en]

    Following the classical philosophical definition of ethics and the psychological research on problem solving and decision making, the issue of ethics becomes concrete and opens up the way for the creation of IT systems that can support handling of moral problems. Also in a sense that is similar to the way humans handle their moral problems. The processes of communicating information and receiving instructions are linguistic by nature. Moreover, autonomous and heteronomous ethical thinking is expressed by way of language use. Indeed, the way we think ethically is not only linguistically mediated but linguistically construed – whether we think for example in terms of conviction and certainty (meaning heteronomy) or in terms of questioning and inquiry (meaning autonomy). A thorough analysis of the language that is used in these processes is therefore of vital importance for the development of the above mentioned tools and methods. Given that we have a clear definition based on philosophical theories and on research on human decision-making and linguistics, we can create and apply systems that can handle ethical issues. Such systems will help us to design robots and to prescribe their actions, to communicate and cooperate with them, to control the moral aspects of robots’ actions in real life applications, and to create embedded systems that allow continuous learning and adaptation.

  • 344.
    Björk, Nils
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction.
    Simple feature detection inindoor geometry scanned with theMicrosoft Hololens2020Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The aim of this work was to determine whether line-type features(straight lines found in geometry considered interesting by auser) could be identified in spatial map data of indoorenvironments produced by the Microsoft Hololens augmented realityheadset. Five different data sets were used in this work onwhich the feature detection was performed, these data sets wereprovided as sample data representing the spatial map of fivedifferent rooms scanned using the Hololens headset which areavailable as part of the Hololens emulator. Related work onfeature detection in point clouds and 3D meshes were investigatedto try and find a suitable method to achieve line-type featuredetection. The chosen detection method used LSQ-plane fitting andrelevant cutoff variables to achieve this, which was inspired byrelated work on the subject of feature identification and meshsimplification. The method was evaluated using user-placedvalidation features and the distance between them and the detectedfeatures, defined using the midpoint diistance metric was used asa measure of quality for the detected measures. The resultingfeatures were not accurate enough to reliably or consistentlymatch the validation features inserted in the data and furtherimprovements to the detection method would be necessary to achievethis. A local feature-edge detection using the SOD & ESODoperators was considered and tested but was found to not besuitable for the spatial data provided by the Hololens emulator.The results shows that finding these features using the provideddata is possible, and the methods to produce them numerous. Thechoice of mehtod is however dependent on the ultimate applicationof these features, taking into account requirements for accuracyand performance.

    Download full text (pdf)
    fulltext
  • 345. Björkman, Eva
    et al.
    Zagal, Juan Cristobal
    Lindeberg, Tony
    KTH, Superseded Departments (pre-2005), Numerical Analysis and Computer Science, NADA.
    Roland, Per E.
    Evaluation of design options for the scale-space primal sketch analysis of brain activation images2000In: : HBM'00, published in Neuroimage, volume 11, number 5, 2000, 2000, Vol. 11, p. 656-656Conference paper (Refereed)
    Abstract [en]

    A key issue in brain imaging concerns how to detect the functionally activated regions from PET and fMRI images. In earlier work, it has been shown that the scale-space primal sketch provides a useful tool for such analysis [1]. The method includes presmoothing with different filter widths and automatic estimation of the spatial extent of the activated regions (blobs).

    The purpose is to present two modifications of the scale-space primal sketch, as well as a quantitative evaluation which shows that these modifications improve the performance, measured as the separation between blob descriptors extracted from PET images and from noise images. This separation is essential for future work of associating a statistical p-value with the scale-space blob descriptors.

    Download full text (pdf)
    fulltext
  • 346.
    Björkman, Mårten
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Bekiroglu, Yasemin
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
    Learning to Disambiguate Object Hypotheses through Self-Exploration2014In: 14th IEEE-RAS International Conference onHumanoid Robots, IEEE Computer Society, 2014Conference paper (Refereed)
    Abstract [en]

    We present a probabilistic learning framework to form object hypotheses through interaction with the environment. A robot learns how to manipulate objects through pushing actions to identify how many objects are present in the scene. We use a segmentation system that initializes object hypotheses based on RGBD data and adopt a reinforcement approach to learn the relations between pushing actions and their effects on object segmentations. Trained models are used to generate actions that result in minimum number of pushes on object groups, until either object separation events are observed or it is ensured that there is only one object acted on. We provide baseline experiments that show that a policy based on reinforcement learning for action selection results in fewer pushes, than if pushing actions were selected randomly.

  • 347.
    Björkman, Mårten
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bekiroglu, Yasemin
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Högman, Virgile
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Enhancing Visual Perception of Shape through Tactile Glances2013In: Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on, IEEE conference proceedings, 2013, p. 3180-3186Conference paper (Refereed)
    Abstract [en]

    Object shape information is an important parameter in robot grasping tasks. However, it may be difficult to obtain accurate models of novel objects due to incomplete and noisy sensory measurements. In addition, object shape may change due to frequent interaction with the object (cereal boxes, etc). In this paper, we present a probabilistic approach for learning object models based on visual and tactile perception through physical interaction with an object. Our robot explores unknown objects by touching them strategically at parts that are uncertain in terms of shape. The robot starts by using only visual features to form an initial hypothesis about the object shape, then gradually adds tactile measurements to refine the object model. Our experiments involve ten objects of varying shapes and sizes in a real setup. The results show that our method is capable of choosing a small number of touches to construct object models similar to real object shapes and to determine similarities among acquired models.

    Download full text (pdf)
    2013_IROS_bbhk.pdf
  • 348.
    Björkman, Mårten
    et al.
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Bergström, Niklas
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Kragic, Danica
    KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
    Detecting, segmenting and tracking unknown objects using multi-label MRF inference2014In: Computer Vision and Image Understanding, ISSN 1077-3142, E-ISSN 1090-235X, Vol. 118, p. 111-127Article in journal (Refereed)
    Abstract [en]

    This article presents a unified framework for detecting, segmenting and tracking unknown objects in everyday scenes, allowing for inspection of object hypotheses during interaction over time. A heterogeneous scene representation is proposed, with background regions modeled as a combinations of planar surfaces and uniform clutter, and foreground objects as 3D ellipsoids. Recent energy minimization methods based on loopy belief propagation, tree-reweighted message passing and graph cuts are studied for the purpose of multi-object segmentation and benchmarked in terms of segmentation quality, as well as computational speed and how easily methods can be adapted for parallel processing. One conclusion is that the choice of energy minimization method is less important than the way scenes are modeled. Proximities are more valuable for segmentation than similarity in colors, while the benefit of 3D information is limited. It is also shown through practical experiments that, with implementations on GPUs, multi-object segmentation and tracking using state-of-art MRF inference methods is feasible, despite the computational costs typically associated with such methods.

    Download full text (pdf)
    2011_CVIU_bbk
  • 349.
    Bladh, Daniel
    Linköping University, Department of Electrical Engineering, Computer Vision.
    Deep Learning-Based Depth Estimation Models with Monocular SLAM: Impacts of Pure Rotational Movements on Scale Drift and Robustness2023Independent thesis Advanced level (degree of Master (Two Years)), 28 HE creditsStudent thesis
    Abstract [en]

    This thesis explores the integration of deep learning-based depth estimation models with the ORB-SLAM3 framework to address challenges in monocular Simultaneous Localization and Mapping (SLAM), particularly focusing on pure rotational movements. The study investigates the viability of using pre-trained generic depth estimation networks, and hybrid combinations of these networks, to replace traditional depth sensors and improve scale accuracy in SLAM systems. A series of experiments are conducted outdoors, utilizing a custom camera setup designed to isolate pure rotational movements. The analysis involves assessing each model's impact on the SLAM process as well as performance indicators (KPIs) on both depth estimation and 3D tracking. Results indicate a correlation between depth estimation accuracy and SLAM performance, underscoring the potential of depth estimation models in enhancing SLAM systems. The findings contribute to the understanding of the role of monocular depth estimation in integrating with SLAM, especially in applications requiring precise spatial awareness for augmented reality.

    Download full text (pdf)
    fulltext
  • 350.
    Blaszczyk, Martin
    Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering.
    Autonomous Quadcopter Landing with Visual Platform Localization2023Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Multicopters such as quadcopters are a popular tool within industries such as mining, shipping and surveillance where a high level of autonomy can save time, increase efficiency and most importantly provide safety. While Unmanned Aerial Vehicles have been a big area in research and used in the mentioned industries, the level of autonomy is still low. Simple actions such as loading and offloading payload or swapping batteries is still a manual task performed by humans. If multicopters are to be used as an autonomous tool the need for solutions where the machines can perform the simplest task such as swapping batteries become an important stepping stone to reach the autonomy goals. Earlier works propose landing solutions focused on landing autonomous vehicles but the lack of accuracy is hindering the vehicles to safely dock with a landing platform. This thesis combines multiple areas such as trajectory generation, visual marker tracking and UAV control where results are shown in both simulation and laboratory experiments. With the use of a Model Predictive Controller for both trajectory generation and UAV control, a multicopter can safely land on a small enough platform which can be mounted on a small mobile robot. Additionally an algorithm to tune the trajectory generator is presented which shows how much weights can be increased in the MPC controller for the system to remain stable. 

    Download full text (pdf)
    fulltext
45678910 301 - 350 of 3114
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf