Data Collection – International Surgical Trials Toolkit

Data collection is not that different from any UK based multi-centre trial, but the international aspect adds another layer of complexity (e.g. data items that are ‘standard’ in one hospital may not be so in others, in an international trial this variation will be both between countries and possibly hospitals within that country). Some of this will depend on how the study is structured, such as having the coordinating centre liaising directly with individual sites, or havinga lead site for a country that can help particularly with language, differences in health care systems, and IT challenges.

There are different models that can be used for data acquisition: a single study wide database or separate database in each country/territory with data brought together by the coordinating centre. The choice may depend on the collaboration model chosen for the study, as different solutions will be easier/harder to implement depending on the chosen model.

Single Database

Multiple Databases

A single database will result in comparable data across the study, that has been subjected to the same validation and collected in the same way. It will however take more work to ensure that data fields reflect local differences between sites.

General Recommendations

Avoid questions that are answered with free text. Free text answers are never easy to deal with and language or terminology difference. If drop down or yes/no questions are not feasible consider ‘code lists’, where a numerical or alpha numeric code is used instead of the full name / free text. Example code lists:
If the answers have a choice of answers make sure the answers are mutually exclusive and the list is exhaustive (unless you include an ‘other’ category).
If more than one option can be selected, consider a yes/no option for each option:
- E.g. if participants can experience more than one symptom consider a yes/no for each symptom. This is easier to validate on the database, and also you can be sure that symptoms have not been missed by accident. In the second example below, an empty box could be a genuine ‘no’ or could mean that the box had been skipped.
Avoid ambiguous questions.

Legislation

Consider if Data Protection laws are different or have different requirements, or even the same law (e.g. GDPR) may have different interpretations locally.
Consider legislation around data sharing required as part of the trial, or potential future data sharing

Issues arising from different health care systems

Are the data items available in overseas centres? It shouldn’t be assumed that data items that are ‘standard’ in the UK will be ‘standard’ in other countries.
Does te data collection rely on a particular patient pathway? Differences in healthcare provision may affect this.
Is follow-up data required after discharge from the index operation, or data not collected by the recruiting centre? Out of hospital health care provision may be very different in other countries.
Does the data collection allow for differences in standard practice? Is terminology standardised or have you allowed for local variations?
Use local experts where possible to check the above.

IT/eCRF issues

Data acquisition
- Will this be via a hub or direct to the co-ordinating centre?
- Direct data entry or scan/post forms? If the latter how will they be returned securely and in a timely manner? Also if the latter, how will data queries be handled?
- Are there any PROMs? Will the questionnaires be sent by sites or centrally?
- How will PROMs data be collected?
Where should database be hosted? NHS/University servers/networks are generally used for UK only studies but other arrangements may be required for international studies; this may matter less if data entry is centralised.
Robustness and support – especially if used over very diverse and distant network with disparate platforms.
Availability of the network – may necessitate complex ‘off-line’ solutions/software that can sync themselves.
Data security – especially if any offline data collection is involved.
Data integrity – especially if any offline data collection is involved (ie. ensuring data is not lost/changed without audit record)
IT support may be required in multiple languages and available 24 hours to allow for different time zones.
Central user management will be more challenging.
User acceptability testing may be more complex
Differences in time zone may cause issues e.g. processes may be be scheduled to run overnight UK time, and slow the system down when other countries want access.

Language issues

Do CRFs (paper or electronic) need translation (and back translation)? It is recommended that CRFs are in English if possible.
Are there any PROMs? Does this need translation? If they are validated instruments are they available in other languages? If not, the authors may not authorise their use in another language.
Use of SMS text messages (e.g. questionnaire reminders, or even simple questionnaires via text) may not be possible especially if they need to be translated into local languages.
Is there an agreed ‘working language’?

Other considerations

Encouraging participating sites to do a ‘dry run’ with CRFs to identify any potential problems is a useful tool. If this isn’t done, use the SIV to go through the CRFs with the site to identify
User training requirements.
Managing DEQCs (data entry quality check – where a random selection of CRFs are checked against the database to check for data entry mistakes that would not be picked up by inbuilt validation checks) and DCFs (data clarification forms) will both be more challenging.
Units of measurement and standard ranges may have more variability than a UK based trial. Can the database accept/cope with different units?
Health economics - even feasible in international studies?
How any non-data information be collected? E.g. scans, audiorecordings, videos, photos

Separate databases allow for differences across countries such as language, different patient pathways, different units of measurement, however it will require additional work in ensuring consistant data collection between databases to allow for data to be merged for a single trial analysis.