July 2018 update
Over the past couple of months, we have been making some structural changes to our data APIs in preparation for vendor testing. Some of this work is preparatory and will be built upon further, but it will still provide an early version for software vendors to feedback on. In this blog post we will explain some of these changes and how they should help the biobanks and research communities.
Authentication and Authorisation – Multiple sample resource claims per user, OAuth 2
- In our initial prototype, much like our Directory website, each user account could only be an administrator for a single sample resource. We now know that in practice that often users are administrators for multiple sample resources, particularly within networks.
- We have updated our APIs, and work is ongoing on the Directory website to allow each user to register and administer sample resources. From the API point of view, obtaining a JWT (authorization token) happens in the same way, and now the claims component of the JWT will contain claims for each sample resource that the user has administrator access to.
- When posting data, (which involves retrieving lists of submissions which have been made) a 'biobankId' property is passed in the URI to identify which sample resource the request relates to.
- We have also developed our implementation of OAuth, which paves the way for our token revocation and refresh mechanisms in the future.
- This should make admin's lives easier as it prevents the need for having separate accounts to manage multiple biobanks.
Data Model updates
The following changes below have been made to the data model:
- Where a SNOMED-CT term was previously required, e.g. 'diagnosisCode', two additional mandatory properties have been added where the ontology and ontology version must be specified. This is preparatory work for when we implement support for other ontologies such as ICD. This applies currently to:
- Diagnosis Code (Diagnosis objects)
- Treatment Code (Treatment objects)
- Extraction Site (Sample objects)
- For extracted samples, the sample content and sample content method fields are required to either both be empty/not specified, or both be specified and valid reference values.
- Sex is now a mandatory field (with a value for 'Unknown') for sample objects. The accepted values come from MIABIS Core. This is in preparation for making similar changes to the Directory.
- Material Type accepted values have been updated to reflect MIABIS Core standards.
- AgeAtDonation and YearOfBirth are now validated against each other – at least one field must be specified, and if both fields are specified, they are checked to ensure that both are correct in respect of each other, to within a 365 day margin of error.
- Volume/VolumeUnit have been made optional fields. There is currently no validation on these fields.
Feedback on errors
When validation errors are present in submitted objects, these should now include the identifying properties of the object, i.e.:
- For Diagnosis objects, the individual reference ID
- For Sample objects, the barcode and individual reference ID
- For Treatment objects, the individual reference ID
This should help client applications track down problematic entities in their submissions in order to address validation errors.
Our prototype APIs originally could handle just 14 samples being submitted before timing out. Now that we are moving towards production, we have implemented caching and reduced the number of database queries to just two, regardless of the number of samples submitted. We are able to handle around 2,000 samples being submitted in one batch and will implement further improvements to increase this to above 10,000.
We have addressed the following bugs from our prototype:
- Implemented validation for Availability values
- Implemented validation for Storage Temperatures
- Resolved data issue where the degrees symbol (°) was unrecognised in the database
- First samples in submission batches not validating against material type field
- Timeout errors reporting errors without messages when connections between backend
- APIs timed out
- Configuration issues affecting documented deployment mechanisms
Work is now underway in the following priority areas including:
- Ensuring that the service can handle the submission of around 10,000 objects per request
- Moving infrastructure between Azure tenancies
- Withdrawal of consent
- Committing/rejecting submissions where objects are 10,000 or greater in number
- Migrating the Directory site to the cloud
- Penetration testing
- Data agreements
- Making reference data fields case-insensitive
- Backend rearchitecting
- Beginning test-driven development