Quit Emailing Yourself

# dataset → language-models

5 links tagged with all of: dataset + language-models

Click any tag below to further narrow down your results

Links

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

This article introduces FinCDM, a framework for assessing financial large language models (LLMs) by evaluating their knowledge and skills rather than relying on a single score. It highlights the creation of a new dataset, CPA-KQA, based on CPA exam questions, which allows for a more nuanced analysis of LLM capabilities in financial contexts. The framework aims to uncover knowledge gaps and enhance model development for real-world applications.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

+ financial language-models ✓ + evaluation + skills dataset ✓

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

The article presents Golden Goose, a method to create unlimited Reinforcement Learning with Verifiable Rewards (RLVR) tasks by using unverifiable internet text. It describes how the authors developed a large-scale dataset, GooseReason-0.7M, which includes over 700,000 tasks across various domains. The approach successfully enhances model performance, even in areas like cybersecurity where prior data was unavailable.

Saved by tldr-importer · Last saved February 14, 2026 · 2 min read

+ reinforcement-learning dataset ✓ language-models ✓ + task-synthesis + cybersecurity

REverse-Engineered Reasoning for Open-Ended Generation

REverse-Engineered Reasoning (REER) introduces a novel approach to instilling deep reasoning in language models by working backwards from known solutions to discover the underlying reasoning process. This method addresses the limitations of traditional reinforcement learning and instruction distillation, resulting in the creation of a large dataset, DeepWriting-20K, and a model, DeepWriter-8B, that outperforms existing models in open-ended tasks. The research emphasizes the importance of structured reasoning and iterative refinement in generating high-quality outputs.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

+ deep-learning + reasoning language-models ✓ dataset ✓ + open-ended-generation

GitHub - F2-Song/Weak-to-Strong-Decoding: The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"

Weak-to-Strong Decoding (WSD) is a novel framework designed to enhance the alignment capabilities of large language models (LLMs) by utilizing a smaller aligned model to guide the initial drafting of responses. By integrating a well-aligned draft model, WSD significantly improves the quality of generated content while minimizing the alignment tax, as demonstrated through extensive experiments and the introduction of the GenerAlign dataset. The framework provides a structured approach for researchers to develop safe AI systems while navigating the complexities of preference alignment.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

language-models ✓ + alignment + decoding + generative-ai dataset ✓

The Common Pile v0.1

EleutherAI has released the Common Pile v0.1, an 8 TB dataset of openly licensed and public domain text for training large language models, marking a significant advancement from its predecessor, the Pile. The initiative emphasizes the importance of transparency and openness in AI research, aiming to provide researchers with essential tools and a shared corpus for better collaboration and accountability in the field. Future collaborations with cultural heritage institutions are planned to enhance the quality and accessibility of public domain works.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ common-pile dataset ✓ + open-source language-models ✓ + transparency