WeSearch
Hub / Tags / Computer Vision
TAG · #COMPUTER-VISION

Computer Vision coverage.

Every story in the WeSearch catalog tagged with #computer-vision, chronological, with view counts. Subscribe to the per-tag RSS feed to follow this topic in your reader of choice.

60 stories tagged with #computer-vision, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.

⌘ RSS feed for this tag →   or   search "Computer Vision"

RELATED TAGS
#ai77#ml34#robotics9#medical-imaging4#technology3#image-processing3#image-generation3#remote-sensing2#video-modeling2#research2#video-editing2#deep-learning2
NVIDIA BLOG

NVIDIA Research Unlocks Advanced Grasping, Smarter Autonomous Driving and Agent Training at Scale

New NVIDIA Research breakthroughs show how training at scale — across gripper types, driving scenarios and virtual worlds — creates AI that generalizes to diverse applications.…

16 views ·
#artificial intelligence#robotics#autonomous vehicles
ARXIV CS.AI

Effect of Demographic Bias on Skin Lesion Classification

In this study, we evaluate the performance of skin lesion classification using ResNet-based convolutional models, focusing on the impact of demographic bias in training data, parti…

17 views ·
#artificial intelligence#machine learning
APPLEINSIDER

Apple's AI research will be in a computer vision conference before WWDC

Apple will present 14 AI research papers at the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition in Denver next week, spanning image generation, spatial understa…

15 views ·
#technology#artificial intelligence
9TO5MAC

Apple to showcase computer vision studies at annual conference in June

Apple has shared details of its participation in this year’s IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).…

22 views ·
#technology#conference
ARXIV CS.AI

FAST-GOAL: Fast and Efficient Global-local Object Alignment Learning

Vision-language models such as CLIP have shown impressive capabilities in aligning images and text, but they often struggle with lengthy and detailed text descriptions due to pre-t…

19 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

AssetGen: Deployable 3D Asset Generation at Interactive Speed

While 3D generation is progressing rapidly, recent work has often focused on obtaining high-resolution assets, leaving user experience and deployability as afterthoughts. We presen…

14 views ·
#3d graphics#artificial intelligence
ARXIV CS.AI

VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents

We present VISTA (VIsual Spec-To-App Benchmark), a benchmark for evaluating the end-to-end web-app generation capabilities of LLM-based agents. Unlike prior code generation benchma…

13 views ·
#software engineering#artificial intelligence
ARXIV CS.AI

In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models

We are in the midst of large-scale industrial and academic efforts to automate the processes of scientific, technological and creative production through AI-driven assistants. Hist…

22 views ·
#artificial intelligence#neural networks
ARXIV CS.AI

Lattice theory and algebraic models for deep convolutional learning based on mathematical morphology

We develop a rigorous algebraic framework for deep convolutional architectures, CNNs, ResNets, and encoder--decoder networks such as UNet, grounded in lattice theory and mathematic…

15 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration

The generation of factually incorrect objects, commonly known as object hallucination, remains a persistent challenge in Large Vision-Language Models (LVLMs). Current approaches to…

15 views ·
#artificial intelligence#machine learning
R/CSCAREERQUESTIONS

Computer Vision Engineer, Looking for advice

13 views ·
ARXIV CS.AI

Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision?

Benchmark accuracy is often implicitly assumed to reflect grounded visual understanding in vision-language models (VLMs), yet it remains unclear to what extent such scores truly re…

11 views ·
#artificial intelligence#language processing
ARXIV CS.AI

Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

Understanding and monitoring human behavior in metro stations play an important role in supporting suicide prevention efforts, where early identification of high-risk situations ca…

12 views ·
#ai#surveillance#mental health
ARXIV CS.AI

The TIME Machine: On The Power of Motion for Efficient Perception

Video representation learning has seen tremendous progress in recent years. This has been driven by many factors, including the scale of training and the success of visual models t…

8 views ·
#machine learning#artificial intelligence
ARXIV CS.AI

Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering

Vision foundation models are widely used as frozen backbones across many downstream tasks, making them a single point of failure under adversarial attack. We study multi-level Floy…

10 views ·
#adversarial attacks#machine learning
ARXIV CS.AI

CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection

Existing Video Anomaly Detection (VAD) methods typically rely on task-specific training, leading to strong domain dependency and high training costs. Moreover, most existing method…

10 views ·
#artificial intelligence#video anomaly detection
ARXIV CS.AI

Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking

Tracking tumor lesions across serial CT scans is essential for oncological response assessment. Existing automated methods face a fundamental trade-off: end-to-end trackers achieve…

11 views ·
#healthcare#oncology
ARXIV CS.AI

Lipschitz Optimization for Formal Verification of Homographies

The adoption of vision neural networks in regulated industries requires formal robustness guarantees, especially in safety-critical domains such as healthcare, autonomous vehicles,…

11 views ·
#machine learning#artificial intelligence
ARXIV CS.AI

SimInsert: Seamless Video Object Insertion via Regional Sparse Attention Fusion

Video object insertion requires ensuring spatio-temporal coherence and interactive realism, extending far beyond simple content placement. However, current approaches are often hin…

13 views ·
#artificial intelligence#video editing
ARXIV CS.AI

Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution

Generative priors in Image Super-Resolution (SR) often compromise faithful restoration, we attribute this limitation to a fundamental spectral misalignment between isotropic object…

15 views ·
#image processing#artificial intelligence
ARXIV CS.AI

ChainFlow-VLA: Causal Flow Planning with Vision-Language Models

Current end-to-end autonomous driving systems are fundamentally limited by a mismatch between temporal causal reasoning and global trajectory consistency. Autoregressive (AR) model…

11 views ·
#artificial intelligence#robotics
ARXIV CS.AI

CHASD: Language Increment-Calibrated Contrastive Decoding against Hallucination in LVLMs

Large Vision-Language Models have shown strong multimodal reasoning capabilities, yet they remain susceptible to object hallucinations when language priors dominate insufficient or…

13 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

Online Hand Gesture Recognition Using 3D Convolutional Neural Networks

In human computer interaction, real-time detection and classification of dynamic hand gestures is challenging as: 1) the system must run in a real-time video stream and there is no…

15 views ·
#artificial intelligence#gesture recognition
ARXIV CS.AI

Conflict-Aware Additive Guidance for Flow Models under Compositional Rewards

Inference-time guided sampling steers state-of-the-art diffusion and flow models without fine-tuning by interpreting the generation process as a controllable trajectory. This provi…

14 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education

Assessing learner competency in clinical simulation requires expert observation that is time-intensive, difficult to scale, and subject to inter-rater variability. Vision-language …

16 views ·
#artificial intelligence#nursing education
ARXIV CS.AI

Generation of Heterogeneous PET Images from Uniform Organ Activity Maps Using a Pretrained Domain-Adapted Diffusion Model

Synthetic PET images are valuable for quantitative imaging workflow development, scalable virtual imaging trials, and deep learning model training, but conventional physics-based s…

12 views ·
#artificial intelligence#medical imaging
ARXIV CS.AI

You Don't Need Attention: Gated Convolutional Modeling for Watch-Based Fall Detection

Existing deep learning approaches for wearable fall detection systems rely on self-attention mechanisms that impose quadratic computational overhead, distributing weights across al…

15 views ·
#artificial intelligence#wearable technology
ARXIV CS.AI

Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis

Medical vision-language models (VLMs) have rapidly advanced as general-purpose multimodal assistants, yet their deployment in 3D Computed Tomography (CT) analysis remains constrain…

17 views ·
#artificial intelligence#medical imaging
ARXIV CS.AI

ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

Long-form image captioning exposes a reward granularity problem in RL: captions are judged as whole sequences, while the important errors occur at the level of individual visual cl…

14 views ·
#machine learning#artificial intelligence
ARXIV CS.AI

Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning

Machine unlearning in Vertical Federated Learning (VFL) has attracted growing interest, yet existing methods certify forgetting solely using output-level metrics. We challenge thes…

16 views ·
#machine learning#federated learning
ARXIV CS.AI

JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA

Industrial anomaly detection has been significantly advanced by Large Multimodal Models (LMMs), enabling diverse human instructions beyond detection, particularly through visually …

14 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

FusionCell: Cross-Attentive Fusion of Layout Geometry and Netlist Topology for Standard-Cell Performance Prediction

Standard cells form the building blocks of digital circuits, so their delay and power critically influence chip-level performance; yet characterization still relies on slow simulat…

15 views ·
#machine learning#artificial intelligence
ARXIV CS.AI

Co-Fusion4D: Spatio-temporal Collaborative Fusion for Robust 3D Object Detection

In autonomous driving, 3D object detection is essential for accurate perception and reliable decision-making. However, object motion and ego-motion often induce cross-frame spatiot…

11 views ·
#autonomous driving#3d object detection
ARXIV CS.AI

SDM: A Powerful Tool for Evaluating Model Robustness

Gradient-based attacks are important methods for evaluating model robustness. However, since the proposal of APGD, it has been difficult for such methods to achieve significant bre…

14 views ·
#machine learning#artificial intelligence
ARXIV CS.AI

Tiny-Engram: Trigger-Indexed Concept Tables for Generative Vision

Current personalization methods for generative vision models typically encode new concepts through continuous adapters or weight updates, yet provide limited control over whether a…

13 views ·
#artificial intelligence#generative models
ARXIV CS.AI

FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation

Modern text-to-image diffusion models encode rich visual priors, but expose them only through one-way text-conditioned generation. Existing unified vision--language models derived …

18 views ·
#artificial intelligence#text-to-image generation
ARXIV CS.AI

Latent Space Guided Scenario Sampling for Multimodal Segmentation Under Missing Modalities

Multimodal semantic segmentation benefits remote sensing analysis by combining complementary information from different sensor modalities. In real-world remote sensing applications…

17 views ·
#artificial intelligence#remote sensing
ARXIV CS.AI

SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework

Building humanoid robots capable of generalizable whole-body loco-manipulation in the real world remains a fundamental challenge. Existing methods either rely on laborious task-spe…

17 views ·
#robotics#artificial intelligence
ARXIV CS.AI

ConceptSeg-R1: Segment Any Concept via Meta-Reinforcement Learning

Recent progress in promptable segmentation has shifted visual perception from object-level localization toward concept-level understanding. However, the notion of a concept remains…

14 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

STELLAR: Scaling 3D Perception Large Models for Autonomous Driving

Model scaling has demonstrated remarkable success through large-scale training on diverse datasets. It remains an open question whether the same paradigm would apply to autonomous …

17 views ·
#autonomous driving#machine learning
ARXIV CS.AI

Pixel Wised Lesion Prediction on COVID-19 CT Imagery: A Comparative Analysis of Automated Image Segmentation Architectures

In recent years, there has been a notable increase in the level of attention that is given to algorithms based on deep learning in the context of medical image segmentation. Nevert…

19 views ·
#deep learning#medical imaging
ARXIV CS.AI

EPC-3D-Diff: Equivariant Physics Consistent Conditional 3D Latent Diffusion for CBCT to CT Synthesis

Cone-beam CT (CBCT) is routinely acquired during radiotherapy for patient setup, but its quantitative reliability is degraded by scatter, noise, and reconstruction artifacts, limit…

14 views ·
#medical imaging#artificial intelligence
ARXIV CS.AI

Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection

We address out-of-distribution (OOD) detection across the full spectrum of distribution shifts -- global domain changes, semantic divergence, texture differences, and covariate cor…

17 views ·
#machine learning#artificial intelligence
ARXIV CS.AI

ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society

Urban heat exposure is becoming an increasingly critical challenge due to the intensifying urban heat island effect. Fine-grained shade patterns, especially those induced by urban …

17 views ·
#urban planning#climate
ARXIV CS.AI

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

We present NeuroQA, a large-scale benchmark for visual question answering in 3D brain magnetic resonance imaging (MRI), with 56,953 QA pairs from 12,977 subjects across 12 datasets…

21 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning

Visual Place Recognition (VPR) aims to match a query image to reference images of the same place in a large-scale database. Recent state-of-the-art methods employ Vision Transforme…

16 views ·
#artificial intelligence#robotics
ARXIV CS.AI

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

Mixture-of-Experts (MoE) models are often interpreted by analysing which categories are routed to which experts. However, routing alone does not reveal what each expert actually en…

18 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models

Diffusion models provide powerful priors for zero-shot video inverse problems, but their real-time deployment is hindered by two inefficiencies: high initial latency caused by holi…

12 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

We present the University of Florida Gators submission to the AmericasNLP 2026 shared task on cultural image captioning for Indigenous languages. Our two-stage pipeline generates a…

16 views ·
#language#artificial intelligence
ARXIV CS.AI

Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics

Text-to-image diffusion models often face a severe trilemma in human portrait generation: text-image alignment, photorealism, and human-perceived aesthetics inherently inhibit one …

19 views ·
#artificial intelligence#image generation
ARXIV CS.AI

Rethinking Cross-Layer Information Routing in Diffusion Transformers

Diffusion Transformers (DiTs) have become a de facto backbone of modern visual generation, and nearly every major axis of their design -- tokenization, attention, conditioning, obj…

17 views ·
#artificial intelligence#transformers
ARXIV CS.AI

SAVER: Selective As-Needed Vision Evidence for Multimodal Information Extraction

Multimodal IE in social media is difficult because a post may attach multiple images that are weakly related, redundant, or even misleading with respect to the text. In this settin…

14 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

Text-to-image models produce graphic design at production scale, but their supervision comes from photo-style preference data with a single overall verdict per comparison. Designer…

18 views ·
#artificial intelligence#graphic design
ARXIV CS.AI

ArchSIBench: Benchmarking the Architectural Spatial Intelligence of Vision-Language Models

Architectural spatial intelligence, the ability to recognize and infer architectural space, is fundamental to tasks such as robot navigation, embodied interaction, and 3D scene und…

19 views ·
#artificial intelligence#benchmarking
ARXIV CS.AI

USV: Towards Understanding the User-generated Short-form Videos

Several large-scale video datasets have been published these years and have advanced the area of video understanding. However, the newly emerged user-generated short-form videos ha…

15 views ·
#artificial intelligence#video analysis
ARXIV CS.AI

AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees

Large Multimodal Models (LMMs) have recently emerged as promising backbones for GUI-agent models, where high-resolution GUI screenshots are introduced to the prompts at each iterat…

20 views ·
#artificial intelligence#multiagent systems
ARXIV CS.AI

Is VLA Reasoning Faithful? Probing Safety of Chain-of-Causation

We present the first systematic study of faithfulness in Vision-Language-Action (VLA) driving models, analyzing 300 Alpamayo-R1-10B inferences across 100 diverse PhysicalAI-AV scen…

13 views ·
#artificial intelligence#robotics
ARXIV CS.AI

AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

Aligning Text-to-Image (T2I) generation models with human preferences increasingly relies on image reward models that score or rank generated images according to prompt alignment a…

10 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Multimodal large language models are increasingly used as agent backbones that understand multimodal inputs, plan retrieval actions, invoke external tools, and reason over retrieve…

16 views ·
#artificial intelligence#machine learning
ARXIV CS.AI

TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

In real home deployments, household agents must often operate from a complete household scene and a situated household request, rather than from a clean task specification. Such re…

12 views ·
#artificial intelligence#robotics