Click any tag below to further narrow down your results
Links
This article presents a 12-step usability testing process illustrated in comic strips across three different product designs: an app, a grocery site, and a toaster. Each comic uses distinct styles to engage readers while conveying the same core steps in user testing. The author also addresses character consistency issues in the comics and shares insights on the process.
This tool generates Windows PE executables that trigger YARA rule matches, helping users validate their malware detection signatures. It automates the creation of test files based on specific patterns, ensuring effective scanning and rule accuracy. Safe to use, the executables exit immediately without executing harmful code.
mirrord for CI allows developers to run tests directly against a shared staging environment in Kubernetes without deploying code or creating separate test setups. It enhances testing speed and accuracy by connecting CI runners to real services, cutting down on setup time and costs.
This article introduces Sanctum, a tool that automates user simulation to enhance software testing. It converts real user workflows into tests without setup and provides insights into performance and UX. Integrating with existing tools, Sanctum aims to catch bugs and issues before launch.
This page invites users to book a demo of Meticulous, a testing tool that offers extensive test coverage in weeks. It helps identify bugs before they reach production, allowing developers to ship software faster and with greater confidence.
This article outlines mobile app testing options available through QA Wolf. It covers device emulation and real device access for thorough testing across different conditions and settings. The focus is on ensuring apps function correctly in various regional formats and languages.
This article introduces Tripwire, a library for testing error handling in Zig programs. It allows developers to inject errors and verify that cleanup code runs correctly, helping to uncover bugs that might otherwise go unnoticed. The library is optimized away in release builds, ensuring no performance impact.
This article details a test conducted by Mobbin that increased sign-ups for their free account by 7%. By addressing potential user objections with clear language, they effectively reduced perceived risk during the registration process.
SpaceX's upgraded Super Heavy booster has successfully completed cryogenic proof testing, a significant step towards its next Starship flight. The testing involved subjecting the booster to extreme temperatures and pressure cycles, moving the company closer to launching the improved Starship V3 after past failures with the earlier version.
The article details Mike's journey of developing a mobile app called Leafed, a book search engine. He shares insights on using AI tools for planning and feature development, the importance of real device testing, and the practical costs involved in app development.
This article discusses challenges faced by AI agents when performing long tasks across multiple sessions without memory. It introduces a two-part solution using initializer and coding agents to ensure consistent progress, effective environment setup, and structured updates to maintain project integrity.
This article introduces Skyramp, a testing platform designed to manage complex software environments. It offers features like automated test generation, execution in containerized setups, and tools for maintaining test suites without manual effort. Skyramp prioritizes testing gaps based on application specifics and adapts to changes in code and user flows.
This article describes Telescope, a tool for testing web page performance across different browsers. It provides detailed results, including console output, metrics, and screenshots, and supports various parameters for customization. You can run tests via the command line or integrate it into a Node.js script.
This article warns against relying solely on AI for coding without proper system design. It highlights the risks of creating functional but messy products that lack flexibility and robustness. Founders and product teams need to prioritize thoughtful design and testing to avoid complications down the line.
This article details the author's experience with the app submission process for Google Play and Apple’s App Store. It highlights the challenges of meeting testing requirements, gathering user feedback, and the extensive documentation needed for approval. The author emphasizes the importance of understanding and navigating these processes for successful app launches.
This article discusses how AI tools necessitate stricter coding practices to produce high-quality software. It emphasizes the importance of 100% code coverage, thoughtful file organization, and automated best practices to support AI in writing effective code.
This article discusses the development of a Software Factory that leverages AI for non-interactive coding. It focuses on using scenarios instead of traditional tests and introduces the Digital Twin Universe for validating software against behavioral clones of services.
NanoLang is a lightweight programming language designed for large language models. It features mandatory testing, unambiguous syntax, and verified semantics. NanoLang can compile to native C or run on its own virtual machine with isolated foreign function interface support.
Augustus is a new security testing tool designed to identify vulnerabilities in large language models (LLMs), focusing on prompt injection and other attack vectors. Built in Go, it offers faster execution and lower memory usage compared to its Python-based predecessors. With over 210 vulnerability probes, it helps operators assess the security of various LLM providers efficiently.
This article discusses the importance of rigorous testing in software development, particularly for high-availability systems like Jane Street's Aria. It highlights the use of various testing techniques and introduces Antithesis, a tool that helps uncover hidden bugs by simulating real-world chaos in a controlled environment.
The article outlines various design issues in LLVM, including insufficient code review capacity, frequent API changes, and challenges with build times and testing. It emphasizes the need for better testing practices and more stable APIs to enhance user experience and contributor engagement.
OpenAI is testing ads in ChatGPT for U.S. users on free and lower-tier subscriptions. The ads will not affect response quality and user privacy is maintained, with no access to chats by advertisers. Users can opt out of ads through subscription upgrades or by limiting daily messages.
The article emphasizes that insights from past experiments may no longer be valid due to changing contexts. It suggests regularly reassessing decisions and research based on current conditions rather than dismissing them outright because of previous findings.
This article discusses the Hydronium Project, a complete rewrite of the H3 library in Rust, designed for better integration and performance. It highlights the goals of improving safety, speed, and API coverage while presenting testing methodologies and performance benchmarks against the original H3 implementation.
This article discusses the challenges of using AI to generate code for distributed systems, emphasizing that traditional coding practices can lead to bugs that are hard to catch. It argues for frameworks like Hydro that make distributed behavior explicit and aim to reduce these bugs by design, rather than relying solely on testing.
Tristan Hume discusses the evolution of a take-home test developed for hiring performance engineers at Anthropic. As AI models like Claude have improved, the test has been repeatedly redesigned to maintain its effectiveness in distinguishing human talent from AI capabilities. The article also shares insights from the original design and the challenges posed by increasingly capable AI systems.
The article evaluates the breathability of the AusAir AirWeave mask compared to meltblown masks. Testing shows it ranks in the middle for breathability and does not seal well on the face, raising questions about its performance. While it has good filtration claims, its testing standards differ from those of respirators.
Apple’s upcoming Siri upgrade for iOS 26.4 is experiencing issues in internal testing, which may delay the launch of new features. The company plans to distribute these capabilities across future updates, potentially pushing some functions to iOS 26.5 in May and iOS 27 in September.
The article discusses how AI changes the landscape of code reviews, making the reviewer's job more complex. It outlines specific heuristics for assessing pull requests (PRs), focusing on aspects like design, testing, error handling, and the effort put in by the author. The author emphasizes the need for human oversight despite advances in AI review tools.
The article argues that Continuous Integration (CI) is most valuable when it fails, as this indicates mistakes before deployment. It highlights the importance of catching errors early to prevent costly rollbacks and emphasizes that too much CI can slow down development without added benefits.
This article outlines how to effectively test Network Detection and Response (NDR) solutions through realistic simulations. It emphasizes using relevant metrics for evaluation and offers practical advice to avoid common testing mistakes.
Aura Inspector is a tool for testing Salesforce Experience Cloud applications. It helps identify misconfigurations, automate testing, and discover accessible records in both guest and authenticated contexts. You can run it in various modes, including unauthenticated and authenticated scenarios.
This article promotes a 30-minute demo of Momentic's AI-powered testing platform. It aims to explore current QA practices at your organization and discuss how Momentic can help achieve your team's objectives.
This article explains the importance of the act() function in testing React applications. It clarifies when to use act() to ensure state updates are processed correctly during tests, helping to avoid bugs and inaccuracies in assertions. The piece includes examples and best practices for implementing act() effectively.
This article outlines how Swagger facilitates API design, testing, and documentation with a focus on AI readiness. It highlights features that enhance collaboration among teams, enforce standards, and streamline workflows for both human and machine consumption. The platform also offers tools for contract testing and exploratory testing to ensure high-quality APIs.
Gotests is a tool that automatically generates table-driven tests for Go functions and methods by analyzing their signatures. It supports filtering, custom templates, and even AI-generated test cases, making it efficient for developers to ensure test coverage.
This article details how QA Wolf improved Drata's regression testing, reducing time from hours to minutes while increasing test coverage. By automating testing, Drata saved over $500,000 annually and allowed developers to focus on new features.
NEBULA is a PowerShell tool designed for testing Windows execution and persistence methods, including LOLBAS techniques. It provides a menu-driven interface for security researchers and teams to execute tests and log results. Example payloads sourced from Atomic Red Team are included for safe experimentation.
The author argues against traditional line-by-line code review, advocating for a harness-first approach where specifications and testing take priority. They draw on examples from AI-assisted coding and highlight the importance of architecture and feedback loops over direct code inspection. Caveats are noted for critical systems where code review remains essential.
The article details how an AI coding agent inadvertently led to an infinite recursion bug in a web application. A crucial comment was deleted during a UI refactor, resulting in a missing safety constraint that triggered browsers to freeze and crash. The author emphasizes the importance of tests over comments in an AI-augmented coding environment.
The article introduces two tools, Showboat and Rodney, designed to help coding agents demonstrate their work and automate browser tasks. Showboat creates Markdown documents to showcase agent-built features, while Rodney handles browser automation for capturing screenshots and executing JavaScript. Both tools aim to enhance the testing and demonstration process in software development.
This article explains how AI is changing the code review process, emphasizing the need for evidence of code functionality rather than just relying on AI-generated outputs. It contrasts solo developers’ fast-paced workflows with team dynamics, where human judgment remains essential for quality and security. The piece outlines best practices for integrating AI into development and review processes.
The article discusses the implementation of a –dry-run option in a reporting application. This feature allows the developer to preview actions without making changes, enhancing testing efficiency and providing quick feedback during development. It highlights both the benefits and minor drawbacks of using this option.
The article discusses the limitations of similarity scoring in matching API requests and suggests focusing on constraints instead. It highlights the importance of defining clear guardrails to avoid incorrect matches, particularly in testing environments. The approach aims to enhance precision in selecting the right mock data for API testing.
This article discusses how AI is changing the code review process for both solo developers and teams. It emphasizes the need for evidence of working code, highlights the risks of relying too heavily on AI, and outlines best practices for integrating AI into code reviews while maintaining human oversight.
This article outlines how Qodo developed a benchmark to evaluate AI code review systems. It highlights a new methodology that injects defects into real pull requests to assess both bug detection and code quality, demonstrating superior results compared to other platforms.
This article outlines the growth development methodology, which helps software engineers approach their work like product managers. It emphasizes data-driven testing, hypothesis validation, and efficient development processes to create measurable impacts on business metrics.
This article covers how AI test agents enhance voice AI testing by providing tools for autonomous simulation and scalable quality metrics. It details features like multi-speaker analysis, custom dashboards, and automated alerts that help teams improve their voice interactions.
This article discusses how to manage complex filter logic in applications, particularly when dealing with large data sets. It suggests implementing part of the filtering on the client side for better testability and correctness, while still using server-side queries for performance. The author provides practical examples and considerations for when to apply this approach.
Codacy introduces a hybrid code review engine that enhances Pull Request feedback by identifying logic gaps, security issues, and code complexity. It automates the review process, letting developers ship code faster and with more confidence.
OpenAI is set to launch new image generation models, Image-2 and Image-2-mini, designed to enhance visual quality and detail compared to the previous Image-1. Early tests show significant improvements in image fidelity and color accuracy, narrowing the gap with competitors like Google's Nano Banana 2. The rollout is likely to coincide with the anticipated GPT-5.2 release.
The article details an attempt to recreate the 1996 Space Jam website using AI called Claude. Despite providing assets and guidance, Claude struggles with accuracy and measurement, producing increasingly incorrect versions of the layout. The author documents the process and frustrations of working with AI on this project.
Meticulous is a tool that helps developers monitor application interactions and automatically generates a test suite for their code. By recording sessions and mocking backend responses, it ensures reliable tests without the hassle of setting up mock data. This approach allows teams to identify issues before merging changes.
Claude is being tested as a Chrome extension to enhance browser-based AI capabilities while addressing security risks like prompt injection. The pilot aims to gather feedback on safety and usability before a broader release, with participants having control over what Claude can do and access.
This article explores various design languages like linear design and their applications in UX. It emphasizes the importance of meeting accessibility standards and connecting design choices to real-world data and user behavior. The discussion also highlights the need for testing and validating conversion claims in SaaS marketing.
This article covers tunnl.gg, a tool that allows you to expose your localhost with a single command. It offers features like automatic HTTPS, random subdomains for easy sharing, and is useful for scenarios like webhook testing and mobile app testing.
This article encourages users to participate in testing assistive technologies to improve accessibility. It provides detailed results for various ARIA roles and attributes across multiple screen readers and browsers. Users can contribute their findings to enhance the project's effectiveness.
Terminal-Bench 2.0 launches with a new testing framework, Harbor, aimed at improving the evaluation of AI agents in terminal-based tasks. The update includes 89 validated tasks and addresses previous inconsistencies, while Harbor supports scalable testing in cloud environments.
Pinterest revamped its Android end-to-end testing by implementing a time-based sharding mechanism, which reduced build times by 36%. This new system balances the workload across testing shards, mitigating delays caused by slower tests. The switch to an in-house testing platform also improved reliability and developer efficiency.
This article argues for prioritizing user understanding before developing software. By creating detailed user profiles and simulating their interactions, developers can refine their products for better usability and quality. The approach shifts focus from traditional testing to a more user-centered design process.
The article stresses the importance of software engineers providing code that they have manually and automatically tested before submission. It emphasizes accountability in code reviews and the use of coding agents to assist in proving code functionality. Developers should include evidence of their tests to respect their colleagues' time and efforts.
The article discusses how a simple navigation tweak—separating "contact sales" and "contact technical support"—led to increased clicks and qualified calls for a brand. It highlights the importance of testing small changes that cater to user needs and improve engagement.
Vijil provides a framework for building reliable, secure, and compliant AI agents. It addresses enterprise concerns about trust through hardened models, continuous testing, and adaptive defenses, helping organizations deploy AI solutions faster and with greater confidence.
Skyramp is a testing platform designed to manage complex software applications. It offers features like automated test generation, execution in containerized environments, and intelligent maintenance to keep test suites updated and efficient. The platform focuses on integrating various testing aspects, ensuring reliable coverage and adaptability to changes.
The article explores how AI coding agents, like the Ralph Wiggum loop, automate software development by using clear specifications and robust testing. It highlights Simon Willison's success in creating an HTML5 parser while multitasking, demonstrating the potential of agents to handle complex tasks autonomously. The key lies in defining success criteria and verifying results efficiently.
This article explains how Atomic Design is a useful pattern for building user interfaces but not suitable as an application architecture. It highlights the risks of overcomplicating components and emphasizes the need to separate UI composition from application logic. The author proposes a structured approach for maintaining clarity and scalability in frontend applications.
This article explores the impact of product discovery on the development of features in tech companies, contrasting two scenarios: one that skips discovery and another that incorporates it. The findings highlight that while AI may speed up coding, thorough product discovery leads to better outcomes in user engagement and revenue.
Better Agents is a command-line tool designed to streamline the creation of coding assistants. It provides a structured framework and best practices for building agents, ensuring features are properly tested, prompts are versioned, and performance is evaluated. The setup includes a clear project structure and necessary configurations for effective collaboration.
This article explains a tool that tests which web features a browser recognizes. It clarifies that the test focuses on recognition, not on whether the features work correctly. Users can understand their browser's capabilities based on the results.
NativeBridge offers manual and AI-driven mobile app testing to streamline workflows for developers, QAs, and designers. Users can instantly access a variety of devices for real-time testing without needing setups or downloads. The platform emphasizes collaboration and supports multiple OS versions for extensive testing capabilities.
This article shares key lessons from a decade of frontend engineering, focusing on how to optimize testing for better development velocity. It emphasizes the importance of minimizing maintenance costs and choosing the right scope for tests to enhance coverage and reliability.
Memlab helps identify memory leaks in JavaScript applications running in browsers and Node.js. Users can define end-to-end test scenarios, run tests in the CLI, and analyze heap snapshots for memory issues. The tool offers various commands for specific memory analyses.
This article details the extensive testing procedures employed for SQLite, highlighting four independent test harnesses and millions of test cases. It covers various tests, including out-of-memory, I/O error, and crash tests, ensuring SQLite's reliability across different scenarios.
This article details Equixly's AI-driven tools that continuously test APIs for vulnerabilities. It highlights features like automated scanning, breach simulations, and compliance tracking to ensure secure code and minimize risks.
This article introduces Zig, highlighting its unique features and advantages over traditional languages like C and C++. It covers installation steps, basic programming concepts, and how to build and test programs. The focus is on practical insights for getting started with Zig.
SmartBear AI offers tools for automating software testing and quality assurance. Users can apply for a private beta to experience AI agents that run tests and generate audit reports. The SmartBear MCP server helps teams streamline workflows and improve testing processes without extensive coding knowledge.
Articos helps teams generate structured insights from ideas and landing pages in minutes. Users can choose between simulated interviews or landing page tests to get immediate feedback without the delays of traditional research methods. This tool aims to clarify messaging and validate concepts efficiently.
Quora transformed its QA process for the Poe.com AI chatbot by switching from manual testing to Momentic's automation tool. This change reduced daily test execution time from 7 hours to just 30 minutes, allowing for hundreds of critical test cases to be automated quickly.
The author used an AI tool to repeatedly modify a codebase, aiming to enhance its quality through an automated process. While the AI added significant lines of code and tests, many of the changes were unnecessary or unmaintainable, leaving the core functionality largely intact but cluttered. The exercise highlighted the pitfalls of prioritizing quantity over genuine quality improvements.
This article announces Superpowers 4.0, highlighting improvements in subagent-driven development and changes to skill descriptions to enhance clarity and usability. It also mentions the introduction of a basic test suite and increased use of GraphViz for internal documentation.
This article discusses the issues caused by frozen test fixtures in large codebases, where changes can lead to false test failures. It emphasizes writing focused tests to prevent fixture dependency problems and explores effective strategies for maintaining both fixtures and factories.
QA Wolf offers a service that creates and manages automated test plans for your software. They cover a wide range of testing needs, including third-party integrations and API functionality. The service uses Playwright, ensuring you retain your test code even if you discontinue the service.
This article discusses how Catching JiTTests, generated by large language models, streamline the testing process in fast-paced software development. Unlike traditional testing, JiTTests adapt to code changes without the need for ongoing maintenance, focusing on catching serious bugs efficiently.
QA Wolf offers an AI-driven testing platform that automates complex tests, from APIs to mobile apps. It provides features like parallel test execution, detailed bug reporting, and seamless integration with CI tools. Users benefit from real-time support and significant time savings in their QA processes.
The article argues against testing multiple design options at the same time, explaining that it often leads to unclear results and requires more participants. It emphasizes the importance of focusing on one design, learning from it, and making necessary improvements.
The article discusses the issue of bot clicks in newsletters and how they can distort engagement metrics. The author shares a simple method to detect bot activity by embedding an unnoticeable link, revealing that their subscriber list appears mostly clean based on the results.
Momentic is a platform that automates testing for software teams, allowing them to create tests using natural language. It reduces the time needed for test automation, lowers false positives, and improves release cadence. The AI-driven tool adapts to changes in the application, making QA more efficient.
This article emphasizes the responsibility of software engineers to deliver code that has been thoroughly tested and proven to work, both manually and automatically. It argues against the trend of relying on AI tools to submit untested code and stresses the importance of accountability in the development process.
This article outlines critical errors in usability testing that can lead to misleading results. It details eight common mistakes, such as unclear goals and poor participant selection, and offers practical tips to improve the effectiveness of user tests.
This article discusses the benefits and challenges of using AI in programming from the perspective of a senior engineer. It shares practical tips and personal insights on how to effectively integrate AI tools into workflows while addressing common concerns about code quality and understanding.
Jay Schwedelson shares four innovative strategies for using email pre-headers effectively. He emphasizes their importance and suggests A/B testing to enhance engagement. This approach can give marketers an edge in their email campaigns.
An A/B test aimed at helping customers choose dog food based on their dog's size led to a -4.74% conversion rate. Feedback suggests that customers prefer control and ease over added complexity, highlighting the importance of user-friendly design in e-commerce.
Signadot provides a platform for agile code validation using isolated sandboxes within Kubernetes. It allows developers and AI agents to quickly test and verify code changes in real-time, ensuring efficient workflows and reducing errors before merging.
Decipher AI automates end-to-end testing by generating tests from recorded actions or described workflows. It keeps tests updated automatically and alerts teams when users encounter bugs in production, ensuring quick resolutions and minimal disruption.
Meticulous automates testing by monitoring user interactions and generating a comprehensive test suite. It simplifies the testing process by recording sessions and providing side-effect free tests, allowing developers to see the impact of code changes before merging.
This article discusses the importance of evaluations (evals) for AI agents to identify issues before they reach users. It outlines the structure of evals, their benefits throughout an agent's lifecycle, and various grading methods to assess agent performance. The piece emphasizes how evals help teams maintain quality and adapt to new models efficiently.
This article introduces a JavaScript Effect System that separates the description of actions from their execution, improving testability and readability. It transforms traditional imperative coding into a declarative style, allowing for easier management of side effects and better handling of asynchronous tasks.
The article outlines a process for migrating a large codebase of frontend tests from React Testing Library v13 to v14 using AI tools. It details the challenges faced during the migration, including code changes and maintaining test coverage, while emphasizing the iterative approach taken to improve both the migration guide and the codemod.
RegreSQL automates regression testing for SQL queries in PostgreSQL. It runs your SQL files, compares the output to expected results, and alerts you to any changes. The tool supports snapshot management and allows for configuration of test parameters.
A GitHub CLI extension, gh-signoff, allows developers to run tests locally and sign off on their work without relying on cloud CI services. It emphasizes utilizing fast local machines for continuous integration, providing options for full or partial signoffs on various CI steps. The extension is open-source and can be easily installed and configured for projects.