Image and Video Processing

See recent articles

Total of 18 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2407.17780 [pdf, html, other]: Title: HF-Fed: Hierarchical based customized Federated Learning Framework for X-Ray Imaging

Tajamul Ashraf, Tisha Madame

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In clinical applications, X-ray technology is vital for noninvasive examinations like mammography, providing essential anatomical information. However, the radiation risk associated with X-ray procedures raises concerns. X-ray reconstruction is crucial in medical imaging for detailed visual representations of internal structures, aiding diagnosis and treatment without invasive procedures. Recent advancements in deep learning (DL) have shown promise in X-ray reconstruction, but conventional DL methods often require centralized aggregation of large datasets, leading to domain shifts and privacy issues. To address these challenges, we introduce the Hierarchical Framework-based Federated Learning method (HF-Fed) for customized X-ray imaging. HF-Fed tackles X-ray imaging optimization by decomposing the problem into local data adaptation and holistic X-ray imaging. It employs a hospital-specific hierarchical framework and a shared common imaging network called Network of Networks (NoN) to acquire stable features from diverse data distributions. The hierarchical hypernetwork extracts domain-specific hyperparameters, conditioning the NoN for customized X-ray reconstruction. Experimental results demonstrate HF-Fed's competitive performance, offering a promising solution for enhancing X-ray imaging without data sharing. This study significantly contributes to the literature on federated learning in healthcare, providing valuable insights for policymakers and healthcare providers. The source code and pre-trained HF-Fed model are available at \url{this https URL}.
[2] arXiv:2407.17882 [pdf, html, other]: Title: Artificial Immunofluorescence in a Flash: Rapid Synthetic Imaging from Brightfield Through Residual Diffusion

Xiaodan Xing, Chunling Tang, Siofra Murdoch, Giorgos Papanastasiou, Yunzhe Guo, Xianglu Xiao, Jan Cross-Zamirski, Carola-Bibiane Schönlieb, Kristina Xiao Liang, Zhangming Niu, Evandro Fei Fang, Yinhai Wang, Guang Yang

Subjects: Image and Video Processing (eess.IV)

Immunofluorescent (IF) imaging is crucial for visualizing biomarker expressions, cell morphology and assessing the effects of drug treatments on sub-cellular components. IF imaging needs extra staining process and often requiring cell fixation, therefore it may also introduce artefects and alter endogenouous cell morphology. Some IF stains are expensive or not readily available hence hindering experiments. Recent diffusion models, which synthesise high-fidelity IF images from easy-to-acquire brightfield (BF) images, offer a promising solution but are hindered by training instability and slow inference times due to the noise diffusion process. This paper presents a novel method for the conditional synthesis of IF images directly from BF images along with cell segmentation masks. Our approach employs a Residual Diffusion process that enhances stability and significantly reduces inference time. We performed a critical evaluation against other image-to-image synthesis models, including UNets, GANs, and advanced diffusion models. Our model demonstrates significant improvements in image quality (p<0.05 in MSE, PSNR, and SSIM), inference speed (26 times faster than competing diffusion models), and accurate segmentation results for both nuclei and cell bodies (0.77 and 0.63 mean IOU for nuclei and cell true positives, respectively). This paper is a substantial advancement in the field, providing robust and efficient tools for cell image analysis.
[3] arXiv:2407.18026 [pdf, html, other]: Title: Segmentation-guided MRI reconstruction for meaningfully diverse reconstructions

Jan Nikolas Morshuis, Matthias Hein, Christian F. Baumgartner

Comments: Accepted at DGM4MICCAI 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Inverse problems, such as accelerated MRI reconstruction, are ill-posed and an infinite amount of possible and plausible solutions exist. This may not only lead to uncertainty in the reconstructed image but also in downstream tasks such as semantic segmentation. This uncertainty, however, is mostly not analyzed in the literature, even though probabilistic reconstruction models are commonly used. These models can be prone to ignore plausible but unlikely solutions like rare pathologies. Building on MRI reconstruction approaches based on diffusion models, we add guidance to the diffusion process during inference, generating two meaningfully diverse reconstructions corresponding to an upper and lower bound segmentation. The reconstruction uncertainty can then be quantified by the difference between these bounds, which we coin the 'uncertainty boundary'. We analyzed the behavior of the upper and lower bound segmentations for a wide range of acceleration factors and found the uncertainty boundary to be both more reliable and more accurate compared to repeated sampling. Code is available at this https URL
[4] arXiv:2407.18054 [pdf, html, other]: Title: LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels

Ziwei Cui, Jingfeng Yao, Lunbin Zeng, Juan Yang, Wenyu Liu, Xinggang Wang

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

The segmentation of cell nuclei in tissue images stained with the blood dye hematoxylin and eosin (H$\&$E) is essential for various clinical applications and analyses. Due to the complex characteristics of cellular morphology, a large receptive field is considered crucial for generating high-quality segmentation. However, previous methods face challenges in achieving a balance between the receptive field and computational burden. To address this issue, we propose LKCell, a high-accuracy and efficient cell segmentation method. Its core insight lies in unleashing the potential of large convolution kernels to achieve computationally efficient large receptive fields. Specifically, (1) We transfer pre-trained large convolution kernel models to the medical domain for the first time, demonstrating their effectiveness in cell segmentation. (2) We analyze the redundancy of previous methods and design a new segmentation decoder based on large convolution kernels. It achieves higher performance while significantly reducing the number of parameters. We evaluate our method on the most challenging benchmark and achieve state-of-the-art results (0.5080 mPQ) in cell nuclei instance segmentation with only 21.6% FLOPs compared with the previous leading method. Our source code and models are available at this https URL.
[5] arXiv:2407.18070 [pdf, html, other]: Title: CSWin-UNet: Transformer UNet with Cross-Shaped Windows for Medical Image Segmentation

Xiao Liu, Peng Gao, Tao Yu, Fei Wang, Ru-Yue Yuan

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep learning, especially convolutional neural networks (CNNs) and Transformer architectures, have become the focus of extensive research in medical image segmentation, achieving impressive results. However, CNNs come with inductive biases that limit their effectiveness in more complex, varied segmentation scenarios. Conversely, while Transformer-based methods excel at capturing global and long-range semantic details, they suffer from high computational demands. In this study, we propose CSWin-UNet, a novel U-shaped segmentation method that incorporates the CSWin self-attention mechanism into the UNet to facilitate horizontal and vertical stripes self-attention. This method significantly enhances both computational efficiency and receptive field interactions. Additionally, our innovative decoder utilizes a content-aware reassembly operator that strategically reassembles features, guided by predicted kernels, for precise image resolution restoration. Our extensive empirical evaluations on diverse datasets, including synapse multi-organ CT, cardiac MRI, and skin lesions, demonstrate that CSWin-UNet maintains low model complexity while delivering high segmentation accuracy.
[6] arXiv:2407.18105 [pdf, html, other]: Title: Multi-Resolution Histopathology Patch Graphs for Ovarian Cancer Subtyping

Jack Breen, Katie Allen, Kieran Zucker, Nicolas M. Orsi, Nishant Ravikumar

Comments: Initially submitted version of a paper which has been accepted in the GRAIL workshop at MICCAI 2024

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Computer vision models are increasingly capable of classifying ovarian epithelial cancer subtypes, but they differ from pathologists by processing small tissue patches at a single resolution. Multi-resolution graph models leverage the spatial relationships of patches at multiple magnifications, learning the context for each patch. In this study, we conduct the most thorough validation of a graph model for ovarian cancer subtyping to date. Seven models were tuned and trained using five-fold cross-validation on a set of 1864 whole slide images (WSIs) from 434 patients treated at Leeds Teaching Hospitals NHS Trust. The cross-validation models were ensembled and evaluated using a balanced hold-out test set of 100 WSIs from 30 patients, and an external validation set of 80 WSIs from 80 patients in the Transcanadian Study. The best-performing model, a graph model using 10x+20x magnification data, gave balanced accuracies of 73%, 88%, and 99% in cross-validation, hold-out testing, and external validation, respectively. However, this only exceeded the performance of attention-based multiple instance learning in external validation, with a 93% balanced accuracy. Graph models benefitted greatly from using the UNI foundation model rather than an ImageNet-pretrained ResNet50 for feature extraction, with this having a much greater effect on performance than changing the subsequent classification approach. The accuracy of the combined foundation model and multi-resolution graph network offers a step towards the clinical applicability of these models, with a new highest-reported performance for this task, though further validations are still required to ensure the robustness and usability of the models.
[7] arXiv:2407.18195 [pdf, html, other]: Title: Advanced depth estimation and 3D geometry reconstruction using Bayesian Helmholtz stereopsis with belief propagation

Razieh Azizi, Hamidreza Amindavar, Hassan Aghaeinia

Comments: 19 pages

Subjects: Image and Video Processing (eess.IV)

Helmholtz stereopsis is one the versatile techniques for 3D geometry reconstruction from 2D images of objects with unknown and arbitrary reflectance surfaces. HS eliminates the need for surface reflectance, a challenging parameter to measure, based on the Helmholtz reciprocity principle. Its Bayesian formulation using maximum a posteriori (MAP) probability approach has significantly improved reconstruction accuracy of HS method. This framework enables the inclusion of smoothness priors which enforces observations and neighborhood information in the formulation. We used Markov Random Fields (MRF) which is a powerful tool to integrate diverse prior contextual information and solved the MAP-MRF using belief propagation algorithm. We propose a new smoothness function utilizing the normal field integration method for refined depth estimation within the Bayesian framework. Utilizing three pairs of images with different viewpoints, our approach demonstrates superior depth label accuracy compared to conventional Bayesian methods. Experimental results indicate that our proposed method yields a better depth map with reduced RMS error, showcasing its efficacy in improving depth estimation within Helmholtz stereopsis.

[8] arXiv:2407.18128 (cross-list from cs.CV) [pdf, html, other]: Title: Estimating Earthquake Magnitude in Sentinel-1 Imagery via Ranking

Daniele Rege Cambrin, Isaac Corley, Paolo Garza, Peyman Najafirad

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Earthquakes are commonly estimated using physical seismic stations, however, due to the installation requirements and costs of these stations, global coverage quickly becomes impractical. An efficient and lower-cost alternative is to develop machine learning models to globally monitor earth observation data to pinpoint regions impacted by these natural disasters. However, due to the small amount of historically recorded earthquakes, this becomes a low-data regime problem requiring algorithmic improvements to achieve peak performance when learning to regress earthquake magnitude. In this paper, we propose to pose the estimation of earthquake magnitudes as a metric-learning problem, training models to not only estimate earthquake magnitude from Sentinel-1 satellite imagery but to additionally rank pairwise samples. Our experiments show at max a 30%+ improvement in MAE over prior regression-only based methods, particularly transformer-based architectures.
[9] arXiv:2407.18141 (cross-list from cs.HC) [pdf, html, other]: Title: IRIS: Wireless Ring for Vision-based Smart Home Interaction

Maruchi Kim, Antonio Glenn, Bandhav Veluri, Yunseo Lee, Eyoel Gebre, Aditya Bagaria, Shwetak Patel, Shyamnath Gollakota

Comments: 15 pages, 17 figures, 6 tables, to be published in UIST 2024

Subjects: Human-Computer Interaction (cs.HC); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Integrating cameras into wireless smart rings has been challenging due to size and power constraints. We introduce IRIS, the first wireless vision-enabled smart ring system for smart home interactions. Equipped with a camera, Bluetooth radio, inertial measurement unit (IMU), and an onboard battery, IRIS meets the small size, weight, and power (SWaP) requirements for ring devices. IRIS is context-aware, adapting its gesture set to the detected device, and can last for 16-24 hours on a single charge. IRIS leverages the scene semantics to achieve instance-level device recognition. In a study involving 23 participants, IRIS consistently outpaced voice commands, with a higher proportion of participants expressing a preference for IRIS over voice commands regarding toggling a device's state, granular control, and social acceptability. Our work pushes the boundary of what is possible with ring form-factor devices, addressing system challenges and opening up novel interaction capabilities.

[10] arXiv:2108.08158 (replaced) [pdf, html, other]: Title: Practical X-ray Gastric Cancer Screening Using Refined Stochastic Data Augmentation and Hard Boundary Box Training

Hideaki Okamoto, Takakiyo Nomura, Kazuhito Nabeshima, Jun Hashimoto, Hitoshi Iyatomi

Comments: 19 pages, 6 figures

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Endoscopy is widely used to diagnose gastric cancer and has a high diagnostic performance, but because it must be performed by a physician, the number of people who can be diagnosed is limited. Gastric X-ray, on the other hand, can be performed by technicians and can screen a much larger number of patients than endoscopy, but its correct diagnosis requires experience. We propose an unprecedented and practical gastric cancer diagnosis support system for gastric X-ray images, which will enable more people to be screened. The system is based on a general deep learning-based object detection model and includes two novel technical proposals: refined probabilistic stomach image augmentation (R-sGAIA) and hard boundary box learning (HBBT). R-sGAIA is a probabilistic gastric fold region enhancement method that provides more learning patterns for cancer detection models. HBBT is an efficient training method for object detection models that allows the use of unannotated negative (i.e., healthy control) samples that cannot be used for training in conventional detection models, thereby improving model performance. The sensitivity (SE) of the proposed system for gastric cancer (90.2%) is higher than that of the expert (85.5%), and two out of five candidates detected box are cancerous, achieving a high precision while maintaining a high processing speed of 0.51 seconds/image. The proposed system showed 5.9 points higher on the F1 score compared to methods using the same object detection model and state-of-the-art data augmentation. In short, the system quickly and efficiently shows the radiologist where to look, greatly reducing the radiologist's workload.
[11] arXiv:2306.14596 (replaced) [pdf, html, other]: Title: Deep Learning for Cancer Prognosis Prediction Using Portrait Photos by StyleGAN Embedding

Amr Hagag, Ahmed Gomaa, Dominik Kornek, Andreas Maier, Rainer Fietkau, Christoph Bert, Florian Putz, Yixing Huang

Comments: MICCAI 2024 Early Accept

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Survival prediction for cancer patients is critical for optimal treatment selection and patient management. Current patient survival prediction methods typically extract survival information from patients' clinical record data or biological and imaging data. In practice, experienced clinicians can have a preliminary assessment of patients' health status based on patients' observable physical appearances, which are mainly facial features. However, such assessment is highly subjective. In this work, the efficacy of objectively capturing and using prognostic information contained in conventional portrait photographs using deep learning for survival predication purposes is investigated for the first time. A pre-trained StyleGAN2 model is fine-tuned on a custom dataset of our cancer patients' photos to empower its generator with generative ability suitable for patients' photos. The StyleGAN2 is then used to embed the photographs to its highly expressive latent space. Utilizing the state-of-the-art survival analysis models and based on StyleGAN's latent space photo embeddings, this approach achieved a C-index of 0.677, which is notably higher than chance and evidencing the prognostic value embedded in simple 2D facial images. In addition, thanks to StyleGAN's interpretable latent space, our survival prediction model can be validated for relying on essential facial features, eliminating any biases from extraneous information like clothing or background. Moreover, a health attribute is obtained from regression coefficients, which has important potential value for patient care.
[12] arXiv:2310.11637 (replaced) [pdf, html, other]: Title: FixPix: Fixing Bad Pixels using Deep Learning

Sreetama Sarkar, Xinan Ye, Gourav Datta, Peter A. Beerel

Subjects: Image and Video Processing (eess.IV)

Efficient and effective on-line detection and correction of bad-pixels can improve yield and increase the expected lifetime of image sensors. This paper presents a comprehensive Deep Learning (DL) based on-line detection and correction approach, suitable for a wide range of pixel corruption rates. A confidence calibrated segmentation approach is introduced, which achieves nearly perfect bad pixel detection, even with a few training samples. A computationally light-weight correction algorithm is proposed for low rates of pixel corruption, that surpasses the accuracy of traditional interpolation-based techniques. In addition, a vision transformer (ViT) auto-encoder based image reconstruction approach is presented which yields promising results for high rates of pixel corruption or clustered defects. Unlike previous methods, which use proprietary images, we demonstrate the efficacy of the proposed methods on the open-source Samsung S7 ISP and MIT-Adobe FiveK datasets. Our approaches yield up to 99.6% detection accuracy with <0.6% false positives and corrected images within 1.5% average pixel error from 70% corrupted images. We achieve correction error at par with the state-of-the-art (SoTA) DL methods for clustered defects with less than half the computational cost.
[13] arXiv:2402.15534 (replaced) [pdf, html, other]: Title: DiCoM -- Diverse Concept Modeling towards Enhancing Generalizability in Chest X-Ray Studies

Abhijeet Parida, Daniel Capellan-Martin, Sara Atito, Muhammad Awais, Maria J. Ledesma-Carbayo, Marius G. Linguraru, Syed Muhammad Anwar

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Chest X-Ray (CXR) is a widely used clinical imaging modality and has a pivotal role in the diagnosis and prognosis of various lung and heart related conditions. Conventional automated clinical diagnostic tool design strategies relying on radiology reads and supervised learning, entail the cumbersome requirement of high quality annotated training data. To address this challenge, self-supervised pre-training has proven to outperform supervised pre-training in numerous downstream vision tasks, representing a significant breakthrough in the field. However, medical imaging pre-training significantly differs from pre-training with natural images (e.g., ImageNet) due to unique attributes of clinical images. In this context, we introduce Diverse Concept Modeling (DiCoM), a novel self-supervised training paradigm that leverages a student teacher framework for learning diverse concepts and hence effective representation of the CXR data. Hence, expanding beyond merely modeling a single primary label within an image, instead, effectively harnessing the information from all the concepts inherent in the CXR. The pre-trained model is subsequently fine-tuned to address diverse domain-specific tasks. Our proposed paradigm consistently demonstrates robust performance across multiple downstream tasks on multiple datasets, highlighting the success and generalizability of the pre-training strategy. To establish the efficacy of our methods we analyze both the power of learned representations and the speed of convergence (SoC) of our models. For diverse data and tasks, DiCoM is able to achieve in most cases better results compared to other state-of-the-art pre-training strategies. This when combined with the higher SoC and generalization capabilities positions DiCoM to be established as a foundation model for CXRs, a widely used imaging modality.
[14] arXiv:2403.02307 (replaced) [pdf, html, other]: Title: Harnessing Intra-group Variations Via a Population-Level Context for Pathology Detection

P. Bilha Githinji, Xi Yuan, Zhenglin Chen, Ijaz Gul, Dingqi Shang, Wen Liang, Jianming Deng, Dan Zeng, Dongmei yu, Chenggang Yan, Peiwu Qin

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Realizing sufficient separability between the distributions of healthy and pathological samples is a critical obstacle for pathology detection convolutional models. Moreover, these models exhibit a bias for contrast-based images, with diminished performance on texture-based medical images. This study introduces the notion of a population-level context for pathology detection and employs a graph theoretic approach to model and incorporate it into the latent code of an autoencoder via a refinement module we term PopuSense. PopuSense seeks to capture additional intra-group variations inherent in biomedical data that a local or global context of the convolutional model might miss or smooth out. Proof-of-concept experiments on contrast-based and texture-based images, with minimal adaptation, encounter the existing preference for intensity-based input. Nevertheless, PopuSense demonstrates improved separability in contrast-based images, presenting an additional avenue for refining representations learned by a model.
[15] arXiv:2405.10870 (replaced) [pdf, html, other]: Title: Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases Autosegmentation

Yixing Huang, Zahra Khodabakhshi, Ahmed Gomaa, Manuel Schmidt, Rainer Fietkau, Matthias Guckenberger, Nicolaus Andratschke, Christoph Bert, Stephanie Tanadini-Lang, Florian Putz

Comments: Official published version in the Green Journal: this https URL

Journal-ref: Radiotherapy & Oncology. 2024, 198, 110419, 1-8

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data.
Materials and methods: A total of six BM datasets from University Hospital Erlangen (UKER), University Hospital Zurich (USZ), Stanford, UCSF, NYU and BraTS Challenge 2023 on BM segmentation were used for this evaluation. First, the multicenter performance of a convolutional neural network (DeepMedic) for BM autosegmentation was established for exclusive single-center training and for training on pooled data, respectively. Subsequently bilateral collaboration was evaluated, where a UKER pretrained model is shared to another center for further training using transfer learning (TL) either with or without LWF.
Results: For single-center training, average F1 scores of BM detection range from 0.625 (NYU) to 0.876 (UKER) on respective single-center test data. Mixed multicenter training notably improves F1 scores at Stanford and NYU, with negligible improvement at other centers. When the UKER pretrained model is applied to USZ, LWF achieves a higher average F1 score (0.839) than naive TL (0.570) and single-center training (0.688) on combined UKER and USZ test data. Naive TL improves sensitivity and contouring accuracy, but compromises precision. Conversely, LWF demonstrates commendable sensitivity, precision and contouring accuracy. When applied to Stanford, similar performance was observed.
Conclusion: Data heterogeneity results in varying performance in BM autosegmentation, posing challenges to model generalizability. LWF is a promising approach to peer-to-peer privacy-preserving model training.
[16] arXiv:2407.01469 (replaced) [pdf, html, other]: Title: Unrolling Plug-and-Play Gradient Graph Laplacian Regularizer for Image Restoration

Jianghe Cai, Gene Cheung, Fei Chen

Subjects: Image and Video Processing (eess.IV)

Generic deep learning (DL) networks for image restoration like denoising and interpolation lack mathematical interpretability, require voluminous training data to tune a large parameter set, and are fragile in the face of covariate shift. To address these shortcomings, we build interpretable networks by unrolling variants of a graph-based optimization algorithm of different complexities. Specifically, for a general linear image formation model, we first formulate a convex quadratic programming (QP) problem with a new $\ell_2$-norm graph smoothness prior called gradient graph Laplacian regularizer (GGLR) that promotes piecewise planar (PWP) signal reconstruction. To solve the posed unconstrained QP problem, instead of computing a linear system solution straightforwardly, we introduce a variable number of auxiliary variables and correspondingly design a family of ADMM algorithms. We then unroll them into variable-complexity feed-forward networks, amenable to parameter tuning via back-propagation. More complex unrolled networks require more labeled data to train more parameters, but have better overall performance. The unrolled networks contain periodic insertions of a graph learning module, akin to a self-attention mechanism in a transformer architecture, to learn pairwise similarity structure inherent in data. Experimental results show that our unrolled networks perform competitively to generic DL networks in image restoration quality while using only a tiny fraction of parameters, and demonstrate improved robustness to covariate shift.
[17] arXiv:2407.07720 (replaced) [pdf, html, other]: Title: SvANet: A Scale-variant Attention-based Network for Small Medical Object Segmentation

Wei Dai, Rui Liu, Zixuan Wu, Tianyi Wu, Min Wang, Junxian Zhou, Yixuan Yuan, Jun Liu

Comments: 14 pages, 9 figures, under review

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. A mild syndrome with small infected regions is an ominous warning and is foremost in the early diagnosis of diseases. Deep learning algorithms, such as convolutional neural networks (CNNs), have been used to segment natural or medical objects, showing promising results. However, analyzing medical objects of small areas in images remains a challenge due to information losses and compression defects caused by convolution and pooling operations in CNNs. These losses and defects become increasingly significant as the network deepens, particularly for small medical objects. To address these challenges, we propose a novel scale-variant attention-based network (SvANet) for accurate small-scale object segmentation in medical images. The SvANet consists of Monte Carlo attention, scale-variant attention, and vision transformer, which incorporates cross-scale features and alleviates compression artifacts for enhancing the discrimination of small medical objects. Quantitative experimental results demonstrate the superior performance of SvANet, achieving 96.12%, 96.11%, 89.79%, 84.15%, 80.25%, 73.05%, and 72.58% in mean Dice coefficient for segmenting kidney tumors, skin lesions, hepatic tumors, polyps, surgical excision cells, retinal vasculatures, and sperms, which occupy less than 1% of the image areas in KiTS23, ISIC 2018, ATLAS, PolypGen, TissueNet, FIVES, and SpermHealth datasets, respectively.
[18] arXiv:2407.17324 (replaced) [pdf, html, other]: Title: Enhanced Deep Learning Methodologies and MRI Selection Techniques for Dementia Diagnosis in the Elderly Population

Nikolaos Ntampakis, Konstantinos Diamantaras, Ioanna Chouvarda, Vasileios Argyriou, Panagiotis Sarigianndis

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Dementia, a debilitating neurological condition affecting millions worldwide, presents significant diagnostic challenges. In this work, we introduce a novel methodology for the classification of demented and non-demented elderly patients using 3D brain Magnetic Resonance Imaging (MRI) scans. Our approach features a unique technique for selectively processing MRI slices, focusing on the most relevant brain regions and excluding less informative sections. This methodology is complemented by a confidence-based classification committee composed of three custom deep learning models: Dem3D ResNet, Dem3D CNN, and Dem3D EfficientNet. These models work synergistically to enhance decision-making accuracy, leveraging their collective strengths. Tested on the Open Access Series of Imaging Studies(OASIS) dataset, our method achieved an impressive accuracy of 94.12%, surpassing existing methodologies. Furthermore, validation on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset confirmed the robustness and generalizability of our approach. The use of explainable AI (XAI) techniques and comprehensive ablation studies further substantiate the effectiveness of our techniques, providing insights into the decision-making process and the importance of our methodology. This research offers a significant advancement in dementia diagnosis, providing a highly accurate and efficient tool for clinical applications.

Total of 18 entries

Showing up to 2000 entries per page: fewer | more | all

Image and Video Processing

New submissions for Friday, 26 July 2024 (showing 7 of 7 entries )

Cross submissions for Friday, 26 July 2024 (showing 2 of 2 entries )

Replacement submissions for Friday, 26 July 2024 (showing 9 of 9 entries )