11 results for "methodology"
Case-Specific Rubrics for Clinical AI Evaluation: Methodology, Validation, and LLM-Clinician Agreement Across 823 Encounters
Objective. Clinical AI documentation systems require evaluation methodologies that are clinically valid, economically viable, and sensitive to iterative changes. Methods requiring expert review per sc…
Show HN: A free ESG stock screener that publishes its losses and methodology
Hey HN, JSS(JumpstartSignal) is a free, ESG-filtered daily stock screener. I built it after some really badly-timed quantum computing stock buys, so I felt I needed to learn more about systematic, lon…
UGAF-ITS: A Standards Harmonization Framework and Validation Tool for Multi-Framework AI Governance in Distributed Intelligent Transportation Systems
Organizations deploying AI-enabled Intelligent Transportation Systems face fragmented governance: ISO/IEC 42001 demands a certifiable management system, the EU AI Act imposes binding high-risk obligat…
Buy LAMFX, But Buy It On Your Own
LAMFX's methodology focuses on companies with a history of dividend increases, not yield, signaling financial strength and resilience. Read why LAMFX fund is a Buy.…
A Systematic Approach for Large Language Models Debugging
Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains…
Do Transaction-Level and Actor-Level AML Queues Agree? An Empirical Evaluation of Granularity Effects on the Elliptic++ Graph
Graph-based anti-money laundering (AML) systems on blockchain networks can score suspicious activity at two granularity levels -- transactions or actor addresses -- yet compliance action is conducted …
FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification
Financial AI systems must produce answers grounded in specific regulatory filings, yet current LLMs fabricate metrics, invent citations, and miscalculate derived quantities. These errors carry direct …
AgentPulse: A Continuous Multi-Signal Framework for Evaluating AI Agents in Deployment
Static benchmarks measure what AI agents can do at a fixed point in time but not how they are adopted, maintained, or experienced in deployment. We introduce AgentPulse, a continuous evaluation framew…
Right-to-Act: A Pre-Execution Non-Compensatory Decision Protocol for AI Systems
Current AI systems increasingly operate in contexts where their outputs directly trigger real-world actions. Most existing approaches to AI safety, risk management, and governance focus on post-hoc va…
HauhauCS (of "Uncensored Aggressive" fame) published an abliteration package that plagiarizes Heretic without attribution, and violates its license
HauhauCS ( u/hauhau901 ) publishes uncensored LLM models on HuggingFace with 5M+ combined monthly downloads across 22 models (verified via the HuggingFace API, April 2026). Every model card claims "0/…
LabelSets — open quality standard for AI training data (LQS v3.1) [D]
Built a third-party quality rating system for ML datasets. Multi-oracle (7 scorers across 5 algorithm families), conformal prediction intervals on downstream F1, Ed25519-signed certs, and a contaminat…