Metadata is the data that describes the data and provides the necessary context needed for future use. Documentation and metadata are key for understanding a dataset and promoting reproducibility and replicability.
Reference: Qin, J., & Zeng, M. L. (2020). Metadata. ALA Neal-Schuman.
Here is a made up example of some data. It seems like its a study about treatment and smoking, but it is unclear what the treatments are. Also, there is no explanation on what Smoking_Cat means. Therefore, there is quite a number of questions that need to be answer in order for the data to be useful or interpretable.
Here is the same data but with its corresponding metadata. Notice that the columns have better descriptors with units now displayed. This gives the additional information that these were baseline measurements taken instead of mid or post-treatment. Acronyms for the treatments are explained on at the bottom and also provide dosing. Smokig_Cat is defined as Smoking Categories and the amount of smoking a person does placed into numeric bins.
The example is not perfect though as the type of smoking (most likely cigarettes in this case as supposed to vaping) is not described. The protocol is also missing. For example, how were subjects chosen for this particular study? Was there any exclusion criterion such as age or presence of disease?