Mendeley Data API Issue Report

Summary

A user reported critical failures in the Mendeley Data public API, specifically version 2 of the API. The primary symptoms involved the search endpoint returning irrelevant, random datasets unrelated to the search query, and direct dataset lookups by ID failing to retrieve the requested specific dataset. Instead of returning the target record, the API appears to return a generic or cached list of results, ignoring the provided identifiers and query parameters. This suggests a breakdown in the routing or service logic handling these specific HTTP requests.

Root Cause

The root cause is likely an implementation error in the API gateway or the application logic responsible for parsing and fulfilling the requests. Two probable scenarios exist based on the symptoms:

  1. Query Parameter/URL Parsing Failure: The backend service is failing to correctly parse the search query parameter from the URL string or failing to extract the dataset_id and version from the URL path parameters.
  2. Fallback to Default Endpoint: Due to this parsing failure, the request handler is likely falling back to a default behavior, such as executing a generic database query (e.g., SELECT * FROM datasets LIMIT 10) or serving a pre-cached “featured” or “latest” list of datasets, completely ignoring the user’s intended filter.

Why This Happens in Real Systems

This type of failure is common in systems that rely on web application frameworks (like Express.js, Flask, or Django) where URL routing configurations can be ambiguous or incorrect.

  • Route Precedence Issues: A generic route (e.g., /api/datasets) might be catching requests intended for specific routes (e.g., /api/datasets/:id) due to order of operations in the router.
  • Middleware Misconfiguration: Middleware responsible for sanitizing or parsing query strings might be stripping parameters or throwing silent errors, causing the controller to use default values.
  • Controller Bypass: The controller function for the specific endpoint might be bypassed entirely, pointing the request handler to the wrong function.

Real-World Impact

  • Loss of Trust: Users relying on the API for automated workflows lose confidence in the service.
  • Broken Integrations: Any code written to fetch specific datasets or search for specific data is broken, halting downstream processes.
  • Data Inaccessibility: Even though the data exists and is accessible via the web UI, it becomes effectively invisible to programmatic access, which is often the primary method for researchers and data scientists.

Example or Code

The user provided the following Python code to demonstrate the failure. The logic attempts to search for “USDA” and lookup a specific ID, but the API responses contradict the input.

import requests

# Test 1: Search for USDA
# The API returns random datasets instead of those matching 'USDA'
response = requests.get(
    "https://data.mendeley.com/api/datasets",
    params={"search": "USDA", "limit": 10},
    headers={"Accept": "application/json"}
)
data = response.json()
results = data.get("data", {}).get("results", [])
print("Search for 'USDA' returned:")
for r in results[:5]:
    print(f" - {r.get('name', 'Unknown')}")
    print(f" ID: {r.get('id')}")

# Test 2: Direct ID lookup
# The API ignores the ID and version, returning random data
response = requests.get(
    "https://data.mendeley.com/api/datasets/pgjvbwznk5/1",
    headers={"Accept": "application/json"}
)
data = response.json()
print(f"\nDirect lookup of pgjvbwznk5/1:")
print(f" Response keys: {data.keys()}")

How Senior Engineers Fix It

  • Immediate Triage: Verify the routing tables in the application code. Check if the specific routes for ID lookup (/datasets/:id) and search (/datasets) are defined correctly and are not being shadowed by a catch-all route.
  • Logging and Tracing: Implement detailed logging at the entry point of the API to print the raw URL parameters (req.query, req.params) to confirm they are reaching the server as expected.
  • Code Review: Audit the controller logic. Ensure that the logic branches correctly based on the presence of query parameters and path parameters. If the parameters are present but ignored, look for variable assignment errors.
  • Testing: Write unit tests specifically targeting these endpoints with the exact parameters provided in the bug report to reproduce the issue locally.

Why Juniors Miss It

  • Framework Intuition: Junior developers often assume frameworks handle routing “magically.” They may not understand that a route like /api/datasets/:id must be declared before a generic /api/datasets route to prevent the generic route from swallowing the specific request.
  • Ignoring Server Logs: They might focus entirely on the client-side code (the Python script) and fail to look at the server-side application logs, which would immediately show if the wrong controller function is being invoked.
  • Ambiguous Route Definitions: They might write a route definition that matches both GET /datasets and GET /datasets/:id but fail to handle the conditional logic inside the handler to distinguish between the two cases.