Search: "ai datasets" — WeSearch Press

DEV COMMUNITY

JSONL Explained: The Line-by-Line Format Powering AI Datasets

You're trying to load a 500,000-record dataset into your script. You reach for JSON — it's universal,...…

Tue, 28 Apr 2026 01:54:30 GMT · 3 views

ARXIV.ORG

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

Industrial maintenance environments increasingly rely on AI systems to assist operators in understanding asset behavior, diagnosing failures, and evaluating interventions. Although large language mode…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation

The rapid proliferation of Generative AI necessitates rigorous documentation standards for transparency and governance. However, manual creation of Model and Data Cards is not scalable, while automate…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion

Multimodal affective computing analyzes user-generated social media content to predict emotional states. However, a critical gap remains in understanding how visual content shapes cognitive interpreta…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

In recent years, the integration of multimodal machine learning in wellbeing assessment has offered transformative potential for monitoring mental health. However, with the rapid advancement of Vision…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Agentic AI platforms for autonomous training and rule induction of human-human and virus-human protein-protein interactions

We instruct an AI agent to construct two separate agentic AI platforms: one for autonomous training of predictive ML models for human-human and virus-human PPI, and the other for inducing explicit gen…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness

Artificial Intelligence and Machine Learning (AI/ML) models used in clinical settings are increasingly deployed to support clinical decision-making. However, when training data become stale due to cha…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

The increasing reliance on Large Language Models (LLMs) across diverse sectors highlights the need for robust domain-specific and language-specific evaluation datasets; however, the collection of such…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

LabelSets — open quality standard for AI training data (LQS v3.1) [D]

Built a third-party quality rating system for ML datasets. Multi-oracle (7 scorers across 5 algorithm families), conformal prediction intervals on downstream F1, Ed25519-signed certs, and a contaminat…

Sun, 26 Apr 2026 20:54:30 GMT · 6 views

ARXIV.ORG

Does Point Cloud Boost Spatial Reasoning of Large Language Models?

3D Large Language Models (LLMs) leveraging spatial information in point clouds for 3D spatial reasoning attract great attention. Despite some promising results, the role of point clouds in 3D spatial …

Tue, 28 Apr 2026 14:55:00 GMT · 3 views

ARXIV.ORG

AdaMamba: Adaptive Frequency-Gated Mamba for Long-Term Time Series Forecasting

Accurate long-term time series forecasting (LTSF) requires the capture of complex long-range dependencies and dynamic periodic patterns. Recent advances in frequency-domain analysis offer a global per…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task

Large language models (LLMs) have shown strong performance on legal benchmarks, including multiple-choice components of bar exams. However, their capacity for generating open-ended legal reasoning in …

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Does Machine Unlearning Preserve Clinical Safety? A Risk Analysis for Medical Image Classification

The application of Deep Learning in medical diagnosis must balance patient safety with compliance with data protection regulations. Machine Unlearning enables the selective removal of training data fr…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

The Kerimov-Alekberli Model: An Information-Geometric Framework for Real-Time System Stability

This study introduces the Kerimov-Alekberli model, a novel information-geometric framework that redefines AI safety by formally linking non-equilibrium thermodynamics to stochastic control for the eth…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs

Large Vision-Language Models (LVLMs) frequently suffer from hallucinations. Existing preference learning-based approaches largely rely on proprietary models to construct preference datasets. We identi…

Tue, 28 Apr 2026 04:13:21 GMT · 2 views

ARXIV.ORG

FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), maintained by the Observational Health Data Sciences and Informatics (OHDSI) collaboration, enabled the harmonisation of el…

Tue, 28 Apr 2026 04:13:21 GMT · 3 views

ARXIV.ORG

Microsoft TRELLIS.2: An Open-Source, 4B-Parameter, Image-to-3D Model [pdf]

Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, which struggle to capture assets with com…

Tue, 28 Apr 2026 01:24:29 GMT · 4 views

Results for "ai datasets".

JSONL Explained: The Line-by-Line Format Powering AI Datasets

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation

Modeling Induced Pleasure through Cognitive Appraisal Prediction via Multimodal Fusion

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

Agentic AI platforms for autonomous training and rule induction of human-human and virus-human protein-protein interactions

An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

LabelSets — open quality standard for AI training data (LQS v3.1) [D]

Does Point Cloud Boost Spatial Reasoning of Large Language Models?

AdaMamba: Adaptive Frequency-Gated Mamba for Long-Term Time Series Forecasting

Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task

Does Machine Unlearning Preserve Clinical Safety? A Risk Analysis for Medical Image Classification

The Kerimov-Alekberli Model: An Information-Geometric Framework for Real-Time System Stability

Aligning with Your Own Voice: Self-Corrected Preference Learning for Hallucination Mitigation in LVLMs

FastOMOP: A Foundational Architecture for Reliable Agentic Real-World Evidence Generation on OMOP CDM data

Microsoft TRELLIS.2: An Open-Source, 4B-Parameter, Image-to-3D Model [pdf]

Or browse by topic