broad institute logo > Data > Olive > Research Areas > Microbiome > Human Microbiome Project

Human Microbiome Project

We're not individuals, we're colonies of creatures.

— Bruce Birren, co-director, Genome Sequencing and Analysis Program

The human body is home to an enormous number and diversity of microbes. Within the body of a healthy adult, microbial cells are estimated to outnumber human cells by 10-fold. The combined genetic contributions of these microbes — in excess of 100,000 protein-coding genes — may provide essential traits not encoded in our own genome yet required for normal human development, physiology, immunity, and nutrition. The mission of the Human Microbiome Project (HMP) is to generate resources to describe these microbial communities and to analyze their roles in health and disease. For more information on the Human Microbiome Project, visit the Data Analysis and Coordination Center.

The HMP is not a single project but an interdisciplinary effort established by the NIH Common Fund. Multi-year grants were awarded to four sequencing centers — the Broad Institute, the Baylor College of Medicine, Washington University School of Medicine, and the J. Craig Venter Institute — to characterize the microbial communities found at several different sites on the human body, including the airway, oral cavity, skin, gastrointestinal tract, and vagina, and to analyze the role of these microbes in human health and disease. HMP includes the following initiatives:

  • Development of a reference set of microbial genome sequences and preliminary characterization of the human microbiome
  • Elucidation of the relationship between disease and changes in the human microbiome
  • Development of new tools and technologies for computational analysis
  • Establishment of a data analysis and coordinating center (DACC)
  • Establishment of resource repositories
  • Examination of the ethical, legal and social implications (ELSI) of HMP research

Reference Genomes

Although the number of microbial species associated with the human body is predicted to be many times the number of reference genomes sequenced in this project, the strains selected for sequencing were chosen to be as phylogenetically representative as possible to establish a valuable benchmark for understanding microbial communities and will enable more complex metagenomic studies.

Specifically, for each body habitat, microbial strains were selected based on community feedback, phylogenetic diversity, clinical importance, abundance within the body site, and significance for understanding genetic diversity within a species. In addition, since not all human-associated microbes can be easily cultured for sequencing, the HMP sought to define the more elusive (most wanted) organisms from the human microbiome. These previously uncultured organisms have been hunted by leading developers of both single cell and novel culture-based methods for sequencing as part of the HMP's Reference Genome collection. Data from the most wanted list is helping to complete the catalog of reference genomes from the microbiome.

HMP DACC hosts the full list of genomes sequenced for HMP and see pubmed for a first paper on the HMP reference collection.


One of the main goals of the HMP was to create a baseline view of the healthy human microbiome in five major areas (airways, skin, oral cavity, gastrointestinal tract, and vagina) and to make this resource available to the broad scientific community. Characterizing the baseline state of the microbiota is a critical first step in determining how altered microbial states contribute to disease. The HMP produced the first consistent sampling of many clinically relevant body habitats within a large population, including paired 16S rRNA gene profiling and deep metagenomic sequencing coverage for hundreds of microbial communities. Sequencing the 16S rRNA gene provides a method for identifying the various species within a bacterial/archaeal community, even those organisms that cannot be grown in the laboratory, while deep metagenomic sequencing allows a complementary view on the genes and pathways present in the community.

Women were sampled at 18 body habitats and men at 15 (excluding three vaginal sites), distributed among five major body areas. Nine specimens were collected from the oral cavity and oropharynx, saliva, buccal mucosa (cheek), keratinized gingiva (gums), palate, tonsils, throat and tongue soft tissues, and supra- and subgingival dental plaque (tooth biofilm above and below the gum). Four skin specimens were collected from the two retroauricular creases (behind each ear) and the two antecubital fossae (inner elbows), and one specimen from the anterior nares (nostrils). A self-collected stool specimen represented the microbiota of the lower gastrointestinal tract, and three vaginal specimens were collected from the vaginal introitus, midpoint and posterior fornix. To evaluate within-subject stability of the microbiome, individuals in these data were sampled at additional time points (up to three total). More details on data generation are provided in the related Nature HMP publication.

A total of 11,174 samples were collected from 300 adults (ref: pubmed 23165986), and in the first phase 16S and WGS data were generated for a total of 5,177 taxonomically characterized communities (16S) and 681 WGS samples, A complete overview of all data sets, different processed products thereof, and detailed descriptions of how data was processed is available at DACC.

Develop and share new technologies and bioinformatics tools

The complexity of and variation between bacterial species requires sophisticated methods to understand the function of bacterial communities. In close collaboration with other genomics centers, the Broad Institute is pioneering the application of new technologies and developing standards for evaluating the data that are generated.

This work includes sequence data from several organisms that have been generated using new sequencing technologies. Data and additional information about the projects can be obtained at the National Institutes of Health.

Broad has also developed a growing suite of utilities for analysis of 16S rRNA gene sequences (Microbiome Utilities portal), and in collaboration with others has been at the forefront of developing cutting edge DNA sequencing and microbial community analysis pipelines: LEfSe (Segata et al., 2011), HUMAnN (Abubucker et al., in press), ChimeraSlayer (Haas et al., 2011), and Picard. These tools allow us to identify both organismal and/or functional biomarkers for health/disease, and associations with available metadata. For a more elaborate perspective on Bioinformatics for the Human Microbiome Project, see also Gevers et al., 2012 PLoS Computational Biology.


Human Microbiome Project data can also be found at NCBI


Please cite all data relating to this initiative (including individual genes and genomes) as:
"Human Microbiome U54 initiative, Broad Institute ("