17 results for "ai datasets"
JSONL Explained: The Line-by-Line Format Powering AI Datasets
You're trying to load a 500,000-record dataset into your script. You reach for JSON — it's universal,...…
IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance
Industrial maintenance environments increasingly rely on AI systems to assist operators in understanding asset behavior, diagnosing failures, and evaluating interventions. Although large language mode…
MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation
The rapid proliferation of Generative AI necessitates rigorous documentation standards for transparency and governance. However, manual creation of Model and Data Cards is not scalable, while automate…
Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion
Multimodal affective computing analyzes user-generated social media content to predict emotional states. However, a critical gap remains in understanding how visual content shapes cognitive interpreta…
FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment
In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision…
Agentic AI platforms for autonomous training and rule induction of human-human and virus-human protein-protein interactions
We instruct an AI agent to construct two separate agentic AI platforms: one for autonomous training of predictive ML models for human-human and virus-human PPI, and the other for inducing explicit gen…
An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness
Artificial Intelligence and Machine Learning (AI/ML) models used in clinical settings are increasingly deployed to support clinical decision-making. However, when training data become stale due to cha…
STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator
The increasing reliance on Large Language Models (LLMs) across diverse sectors highlights the need for robust domain-specific and language-specific evaluation datasets; however, the collection of such…
LabelSets — open quality standard for AI training data (LQS v3.1) [D]
Built a third-party quality rating system for ML datasets. Multi-oracle (7 scorers across 5 algorithm families), conformal prediction intervals on downstream F1, Ed25519-signed certs, and a contaminat…
Does Point Cloud Boost Spatial Reasoning of Large Language Models?
3D Large Language Models (LLMs) leveraging spatial information in point clouds for 3D spatial reasoning attract great attention. Despite some promising results, the role of point clouds in 3D spatial …
AdaMamba: Adaptive Frequency-Gated Mamba for Long-Term Time Series Forecasting
Accurate long-term time series forecasting (LTSF) requires the capture of complex long-range dependencies and dynamic periodic patterns. Recent advances in frequency-domain analysis offer a global per…
Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task
Large language models (LLMs) have shown strong performance on legal benchmarks, including multiple-choice components of bar exams. However, their capacity for generating open-ended legal reasoning in …
Does Machine Unlearning Preserve Clinical Safety? A Risk Analysis for Medical Image Classification
The application of Deep Learning in medical diagnosis must balance patient safety with compliance with data protection regulations. Machine Unlearning enables the selective removal of training data fr…
The Kerimov-Alekberli Model: An Information-Geometric Framework for Real-Time System Stability
This study introduces the Kerimov-Alekberli model, a novel information-geometric framework that redefines AI safety by formally linking non-equilibrium thermodynamics to stochastic control for the eth…
Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs
Large Vision-Language Models (LVLMs) frequently suffer from hallucinations. Existing preference learning-based approaches largely rely on proprietary models to construct preference datasets. We identi…
FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data
The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) collaboration, enabled the harmonisation of el…
Microsoft TRELLIS.2: An Open-Source, 4B-Parameter, Image-to-3D Model [pdf]
Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, which struggle to capture assets with com…