Metadata
Adding metadata to your directory is so important!
Because now you know what the data is about or what you did with it, but will you still remember this in three years?
As stated multiple times before, it is strongly recommended to add metadata files in your directories: these documentation files contain critical context about how/when your data was generated, processed, etc.
The more detailed your metadata file is, the more you:
- help your future self understand your own work months or even years later
- enable colleagues or successors to build upon your research findings effectively
- increase findability & reproducibility of your findings
- fulfil the requirements for data sharing and publication (!)
Storing your DMP alongside your data can serve as a foundational metadata document: this file contains even more important information and can serve as a very handy reference point for anyone accessing the dataset.
Examples of information in a metadata file
- Sequencing machine
- Settings for data generation
- Pipeline used to analyse the data
- Parameters used for certain tools
- Location of the source code
- Versioning of tools that are used
To help with this we have made two options:
Script for generating metadata file
Input of metadata txt file you can copy
Content of the files
- Script that you can run
#!/bin/bash # Prompt for location and define the metadata file path read -p "Enter location: " LOCATION mkdir -p "$LOCATION" # Collecting project name read -p "Fill in the Project Name (without spaces): " PROJECT_NAME METADATA_FILE="${LOCATION}/METADATA_${PROJECT_NAME}.txt" echo "Metadata file: ${METADATA_FILE}" # Project Information echo "| Field | Description |" >> $METADATA_FILE echo "|-------------------------|--------------------------------------------------------------------------------------|" >> $METADATA_FILE echo "| Project Name | ${PROJECT_NAME}" >> $METADATA_FILE read -p "Give a small project description: " PROJECT_DESCRIPTION echo "| Project Description | ${PROJECT_DESCRIPTION}" >> $METADATA_FILE read -p "Start date of the project: " PROJECT_START_DATE echo "| Start Project Date | ${PROJECT_START_DATE}" >> $METADATA_FILE read -p "Current status of the project (e.g., submitted, in progress, completed): " PROJECT_STAT echo "| Project Status | ${PROJECT_STAT}" >> $METADATA_FILE # User Information read -p "Name of the user requesting the service: " USER_NAME echo "| User Name | ${USER_NAME}" >> $METADATA_FILE read -p "Email address of the user requesting the service: " USER_EMAIL echo "| User Email | ${USER_EMAIL}" >> $METADATA_FILE read -p "Principal investigator: " PRI_INV echo "| Principal Investigator | ${PRI_INV}" >> $METADATA_FILE read -p "Collaborator: " COLLAB echo "| Collaborator | ${COLLAB}" >> $METADATA_FILE # Service Request Information read -p "Type of bioinformatics service requested (e.g., sequencing, analysis, consultation): " SERVICE_TYPE echo "| Service Type | ${SERVICE_TYPE}" >> $METADATA_FILE # Sample Information read -p "Type of biological sample (e.g., DNA, RNA, whole genome, exome): " SAMPLE_TYPE echo "| Sample Type | ${SAMPLE_TYPE}" >> $METADATA_FILE read -p "Organism from which the sample was obtained: " ORGANISM echo "| Organism | ${ORGANISM}" >> $METADATA_FILE read -p "Cell line (if applicable): " CELL_LINE echo "| Cell Line | ${CELL_LINE}" >> $METADATA_FILE # Sequencing Information read -p "Library prep (if applicable): " LIBRARY_PREP echo "| Library prep | ${LIBRARY_PREP}" >> $METADATA_FILE read -p "Sequencing technology used (e.g., Illumina, PacBio, Oxford Nanopore): " SEQ_PLAT echo "| Sequencing Platform | ${SEQ_PLAT}" >> $METADATA_FILE read -p "Specific sequencing instrument used (e.g., HiSeq 2500, NovaSeq 6000): " SEQ_INSTR echo "| Sequencing Instrument | ${SEQ_INSTR}" >> $METADATA_FILE read -p "Length of the sequencing reads (e.g., 100 bp, 150 bp): " READ_LENGTH echo "| Read Length | ${READ_LENGTH}" >> $METADATA_FILE read -p "Indicates whether the sequencing was paired-end or single-end: " PAIRED_OR_SINGLE echo "| Paired or Single-End | ${PAIRED_OR_SINGLE}" >> $METADATA_FILE read -p "Average sequencing depth or coverage: " SEQ_DEPT echo "| Sequencing Depth | ${SEQ_DEPT}" >> $METADATA_FILE # Data Information read -p "Format of the input data (e.g., FASTQ, BAM, VCF): " DATA_FORMAT echo "| Data Format | ${DATA_FORMAT}" >> $METADATA_FILE read -p "Location where the input data is stored: " DATA_LOCATION echo "| Data Location | ${DATA_LOCATION}" >> $METADATA_FILE # Analysis Information read -p "Type of bioinformatics analysis requested (e.g., alignment, variant calling, RNA-seq): " ANA_TYPE echo "| Analysis Type | ${ANA_TYPE}" >> $METADATA_FILE read -p "Specific parameters or settings used for the analysis: " ANA_PARAMS echo "| Analysis Parameters | ${ANA_PARAMS}" >> $METADATA_FILE read -p "Reference genome used for the analysis (e.g., hg38, mm10): " REF_GENOME echo "| Reference Genome | ${REF_GENOME}" >> $METADATA_FILE read -p "Format of the output data (e.g., BAM, VCF, CSV): " OUT_FORMAT echo "| Output Format | ${OUT_FORMAT}" >> $METADATA_FILE read -p "Location where the output data will be stored: " OUT_LOCATION echo "| Output Location | ${OUT_LOCATION}" >> $METADATA_FILE # Billing Information read -p "Funding: " FUNDING echo "| Funding | ${FUNDING}" >> $METADATA_FILE # Analyst Information read -p "Name of the bioinformatician or analyst working on the service: " ANA_NAME echo "| Analyst Name | ${ANA_NAME}" >> $METADATA_FILE read -p "If published, link to publication: " PUB_LINK echo "| Publication Link | ${PUB_LINK}" >> $METADATA_FILE read -p "If data in public repository, link to repository: " PUB_REPO echo "| Public Repository Link | ${PUB_REPO}" >> $METADATA_FILE # Additional Information read -p "Additional comments or special instructions: " COMMENTS echo "| Comments | ${COMMENTS}" >> $METADATA_FILE echo "Metadata collection complete. The details have been saved to ${METADATA_FILE}." - A text file that you can copy
| Field | Description | |-------------------------|-----------------------------------------------------------| | Project Name | teste_metadata_script | Project Description | testing the metadatascript | Start Project Date | 29/04/2026 | Project Status | in progress | User Name | Marie Hannaert | User Email | marie.hannaert@uantwerpen.be | Principal Investigator | Arvid | Collaborator | Lauren Moons | Service Type | analysis | Sample Type | DNA | Organism | human | Cell Line | / | Library prep | / | Sequencing Platform | Oxford Nanopore | Sequencing Instrument | / | Read Length | / | Paired or Single-End | PE | Sequencing Depth | 10x | Data Format | FASTQ | Data Location | LTS hopefully | Analysis Type | variant calling | Analysis Parameters | / | Reference Genome | hg38 | Output Format | BAM | Output Location | here needed LTS | Funding | none | Analyst Name | Lauren Moons | Publication Link | / | Public Repository Link | not EGA | Comments | test test