Tuesday, November 26, 2013

My Take on the FDA Warning to 23andMe

NOTE (8/5/2020): As I gained more experience finding problems that were not clear until after 5-10 years of research, I looked back at this and I would not have completely agreed with earlier responses.  So, I have separated my first response (Response #1) and additional responses (Response #2).  I have also started to keep a change log, for updates after this point.

NOTE
Getting Advice About Genetic Testing

I have recently participated in a forum discussion on Biostar about the newest warning to 23andMe from the FDA.  I think my responses will be of interest to a broader audience, so I have copied them here.  If I end up contributing more to the Biostar discussion, I will update my blog post as well.

Comment: The majority of published reactions seem to miss what the FDA's major issue is: about marketing and describing the service.

Response #1: Yeah, I agree that the wording in this warning focuses mostly on marketing.

However, the FDA has previously tried to shut down DTC genomics companies (including 23andMe) and the 3rd paragraph seems to mostly focus on the accuracy of the test. The carrier status report should really be OK for diagnosis (and I think many of the specific associations mentioned in that warning are also pretty well established). Now, there are some caveats to some results like deciding how to combine independent SNP risks and explaining the difference between a mutations that guarantee onset of a disease versus modulate risk (which may only have a modest impact on risk in many cases). I personally think 23andMe does a decent job of this already, but I'm sure there can always be room for improvement.

In other words, my understanding is that the problem was primarily with direct communication with the FDA regarding technical benchmarks that would justify marketing claims (and I think the FDA is supposed to provide permission based upon this data prior to advertising). This certainly relates to communication with customers, but I think delays in formal responses to the FDA from 23andMe were the primary problem.

Response #2: In retrospect, I don't think I should have used the phrase "shut down".

I might add additional thoughts, but this was the main thing that jumped out when I re-read the blog post.

Concern: There is at least one report of an individual with an inaccurate 23andMe report (click here to view).

Response #1: Make sure to read the entire article.  Once the bug was reported, it was fixed and the report was updated.

Response #2: Errors need to be made clear to customers.  You can see some notes like this among this collection of blog posts (which includes submission of multiple FDA MedWatch reports), but I don't believe any of this was made clear to other customers.

While a lack of confidence is sometimes necessary to communicate, it does need to be communicated.

There can also be negative consequences.  For example, if the trace for the automated Sanger sequence was not checked, then sequencing error could be a false positive for a pathogenic mutation.  If this was not caught before action was taken (which could be something like an unnecessary mastectomy), then permanent damage could be caused from providing a result prematurely and/or inaccurately.

That said, I don't want to over-emphasize the potential harm, and I think helping citizens become engaged in problem solving and critical assessment would be valuable (if genomics data / hypotheses were used for that purpose).

Concern: The issue isn't the bug. The issue is that 23andMe is offering a product while making claims about how customers can use results for improved medical care. Medical professionals and should be in charge of offering medical services.

Response: I agree that communication with customers is important and some customers may not have a good sense of what it is like to be part of a research project. It is possible that this is something 23andMe needs to work on.

However, I think the best solution is not to get rid of 23andMe, but rather help improve communication regarding the confidence of results. For example, I wrote a blog post about one possible solution after a previous FDA warning was issued:

http://cdwscience.blogspot.com/2010/08/benefits-to-3-tier-system-for-dtc.html

Also, I don't think this is a typical result. For example, here is a link to my results as well an article from Lifehacker (from someone with much less experience with genomics research). I'm sure I've seen more, but these are what I could think of off the top of my head.

http://cdwscience.blogspot.com/2011/02/thoughts-on-my-23andme-results.html

http://lifehacker.com/5802559/how-to-decode-your-dna-with-personal-genomics-service-23andme

Plus, I think an unfortunate reality is that this sort of thing will happen from time to time. I think pretty much all diagnostics will suffer from some degree of false positives, false negatives, and/or human error. I know I constantly have to update the bioinformatic programs that I design (for what I would call "research grade" analysis) - especially when hunting down bugs that are only apparent when analyzing a small number of data sets.

Concern: The fact that it turned out to just be a bug is one thing, it's pretty bad but they fixed it, but even if it had turned out to be true, it's still horrific. Imagine being sent an email saying, "Hey, you're going to get progressively disabled then die young," and no further information about what the condition is and how it's going to affect you, no kind sympathetic face offering you tissues and advice and options. This guy did his research and turned out to be fine, and that's great, but how many people fall into deep depression on getting this news? How many people kill themselves? You can't give out this kind of potentially devastating life-altering news in an email.

Response #1: For some people, I agree this may not be the best way to communicate results. This is probably why 23andMe adds an additional step for viewing results like this that are not present for non-medical and non-predictive results. If you aren't prepared to view results on-line, then I would probably recommend either not getting a 23andMe profile, not viewing that portion of the results, and/or contacting a genetic counselor to review the results with you. For example, my 23andMe report indicates that I am a carrier for cystic fibrosis and they provide a link on how to talk to a genetic counselor on that page:

https://www.23andme.com/you/genetic_counseling/

That said, I think the concern overall is an over-reaction for the following reasons:

1) There have been several publications showing that most people have no problem responding to DTC genetic testing. I can't list all the publications off the top of my head, but here is a summary of one such article:

http://www.nature.com/news/2011/110112/full/news.2011.12.html

2) In general, the accuracy for the DNA sequencing portion of the tests (currently via an Illumina SNP array) is pretty good. For example, the FDA has recently approved Illumina sequencing for clinical application. I'm also pretty sure 23andMe has checked the accuracy of the array by comparing normal 23andMe clients to the results from people who participated in the exome sequencing pilot. That said, there is a difference between the interpretation for the carrier status results (which is relatively straightforward) and all of the other results, and my understanding is that failure to communicate these results to the FDA is one of the legitimate complaints from the FDA letter.

3) I think people need to be careful and critical in all cases. For example, let's say I had a wife who was also a cystic fibrosis carrier (identified via 23andMe) and we were thinking about kids. The first thing I would do is verify the result. For example, I could order a Counsyl test from a doctor (which might be a good alternative for some people instead of 23andMe, although I think you might still be viewing your results on-line) to verify that we were in fact both carriers. I wouldn't immediately run to an in vitro fertilization clinic. Plus, medical professionals can make wrong calls too. Also, to be clear: I have family members who are confirmed cystic fibrosis carriers, as determined by standard testing.  So, I think the probability of this being a false positive is very low, but I would always want to tread carefully.  This has certainly happened to me, which at least one time delayed hospitalization for a very serious infection. This is not an attack on the medical establishment: there is a reason I went to see the doctors in the first place. However, I think the actions made by this individual were spot on, regardless of whether something is FDA-approved and regardless of whether a result comes from a person or a computer: if something doesn't sound right, you should look into a second opinion, independent research, etc.

Response #3: As an update to 3), I later submitted my samples to multiple companies.  With the raw data, I would confirm that I am cystic fibrosis carrier.  However, multiple companies said that I wasn't a carrier.  So, I think having raw data and taking time to evaluate results is important.

CommentThere is now a class action lawsuit against 23andMe: http://gigaom.com/2013/12/02/23andme-hit-with-class-action-over-misleading-genetic-ads/

Response #1That is unfortunate.

Response #2: While I hope issues can be resolved outside of court, I now agree that 23andMe (and AncestryDNA) ads can be misleading.  As one example, I had/have serious concerns of advertising Airbnb destinations (as mentioned in this blog post).


While I usually kept the previous responses, I thought this paragraph should be changed (hence the different font color).  Essentially, I think the link below is worth reading, but my impression is different.  I still think it was important to change the title (to avoid exaggerating the problem).  However, I think important points were also raised and I apologize for not sufficiently appreciating that before:
I'm still hoping that most conflicts can be settled out of court (if that is still possible at this point). At least this provides a list of specific claims that I hope 23andMe will directly address to customers in an official statement - at least they can reference a plethora of 3rd party experts who can generally back them up. I certainly think they made bad choices with the timing of advertising and providing terse official responses, but I don't think that should be a $5 million mistake (especially for a service that I assume is being provided below cost)
Update: I also have put together a survey on this topic. If you can fill out and/or distribute the survey, I would appreciate it!

Change Log:

11/26/2013 - public post date
8/5/2020 - start keeping change log with updated responses
8/6/2020 - continue to add revised responses

Tuesday, November 19, 2013

RNA-Seq Differential Expression Benchmarks

I recently published a paper whose primary purpose was to serve as a reference for the protocol that I use for RNA-Seq analysis (see main paper and supplemental figures).

The aspect of the paper that I think is most interesting to the genomics community is a comparison of statistical tools for defining differentially expressed genes, which had the greatest influence on the resulting gene lists (at least among the comparisons that I make in the paper).  So, I will review those relevant figures in this blog post.

The plots below show the robustness of the gene lists produced by a given algorithm.  In other words, the higher the "common" line on the graph, the more robust the gene lists (i.e. the higher the proportion of genes commonly called by multiple algorithms).  Most readers will probably not be as interested in the x-axis (rounding factor for RPKM values), and it only changes the gene lists for Partek and sRAP.
Analysis of Patient Cohort (Tumor versus Normal).  1-factor is just tumor versus normal, while 2-factor also includes patient ID (pairing tumor and normal samples).  cuffdiff results not shown because no genes were defined with FDR < 0.05.  sRAP not shown because gene list was very small (see Figure S3 from the paper)
Analysis of Cell Line Comparison (Mutant versus WT)
To be fair, I will certainly admit robustness is not the same as accuracy.   Uniquely identified genes may be true positives that represent a lower false negative rate.  However, this did correspond to some circumstantial evidence I've seen with other datasets where cuffdiff and edgeR have given some weird results.  The results from this paper don't actually contain the clearest examples of this, but you can take a look at the GAGE4 stats to see an example where I would at least argue that edgeR provides inflated statistical significance.

Overall, I think Partek works the best (which is what I use for COH customers), but I was also pleased with DESeq (and sRAP, but I am obviously biased).  In fact, these comparisons support earlier observations that DESeq is conservative in defining lists of differential expressed genes (Robles et al. 2012).

However, my main goal is not to simply tell you what is the single best solution.  In fact, the cell line comparison above also had paired microarray data, and I would say the concordance between the two technologies was roughly similar for most algorithms:
RNA-Seq versus Microarray Gene lists.  "Microarray DEG" = proportion of differentially expressed genes in microarray data also present in RNA-Seq gene list.  "RNA-Seq DEG" = proportion of differentially expressed genes in RNA-Seq data also present in microarray gene list.


The similarity in microarray concordance kind of reminds me of Figure 2a from Rapport et al. 2013, which compares RNA-Seq gene lists to ~1000 qPCR validated genes.  However, I think properly determining accuracy can be difficult.  For example, look at the differences between the qPCR results in Figure 2a and the ERCC spike-ins in Figure S5 for that same paper.

Instead, these are the main take-home points I would like to emphasize:

1) Simple methods comparing RPKM values (in this case, rounded and log2 transformed) for defining differentially expressed genes can work at least as well as more complicated methods that are unique for RNA-Seq analysis (at least for gene-level comparisons).  For example, one claim against count-based methods in general (including edgeR, DESeq, etc.) is that there can be confounding factors, such as changes in splicing patterns.  Although I agree this is a theoretical problem that probably does occur to some extent, it doesn't seem to be a major factor influencing concordance with microarray data, qPCR validation, etc.

2) There is probably not a solution that works best in all situations. In this paper, you can see the results look very different with the patient versus cell line datasets.  For practical reasons, a lot of benchmarks will probably use cell line datasets.  However, it is not safe to assume performance for large patient cohorts will be comparable to cell line data (or patient data with little or no biological replicates).

Monday, November 18, 2013

My American Gut Individual Report

I recently received my American Gut Individual Report for my fecal sample in the mail.  Click here to see this report as a PDF.

The first thing that I noticed was that the phylum distributions were very different than what I calculated from my raw data (click here to see those results).  Namely, ~75% of my reads aligned to Proteobacteria 16S rRNAs, but my individual report said that ~10% of my sample was from Proteobacteria.  So, I contacted the American Gut team to ask why the results were so different.  They said that it appears the shipping process allows differential growth of certain bacteria (especially Gammaprotoebacteria) that would not normally appear in fresh samples.  So, they filter out likely contaminants for the report.

This is something that I would like to learn more about, and the American Gut team also said that they are actively investigating this.  The filtering code is available on the American Gut GitHub website, but I am most curious in seeing population-level metrics about this differential growth.  Namely, I would like to see how the reduction in false positives is affecting the true positive rate.  At least in my case, ~70% of my reads are being filtered out as potential contaminants, which seems like a loss of a lot of information. Additionally, my filtered sample seems to cluster with samples that have relatively low Firmicutes counts (much closer to the original value of ~15%, see PCA plot in lower-right hand corner), when the report says the percentage should be ~65% Firmicutes (bar plot).  In other words, my guess is that the "best" interpretation of my results may lie somewhere between these two reports (where the true Firmuicutes abundance may be lower and the true Proteobacteria abundance may be higher).

That said, I know the American Gut analysis is ongoing, and there will likely be additional future reports.  For example, I know for certain that there will eventually be a report for my oral sample.  It will be interesting to see if the results of future reports change as new scientific findings are discovered.

Tuesday, November 5, 2013

Metagenomic Profiles for American Gut Subjects with Migraines

As someone who experiences migraines, I know that diet can affect the onset of a migraine.  Therefore, I was interested to see if there were any metagenomic differences among subjects with migraines from the American Gut project.

After filtering for FASTQ files greater than 1 MB, I found there were 77 American Gut subjects that suffered from migraines (at least a few times a year).  I calculated abudance levels for phylum, class, and species using MG-RAST (with alignments to RDP).  You can see my previous post for more details.

First, I wanted to see what other variables were correlated with migraine status (in order to make sure I was characterizing metagenomic changes that are specifically associated with migraines and not secondary factors).  I found two factors significantly associated with migraine status: BMI (and other weight-related factors: migraine subjects had a higher BMI) and sex (migraine subjects were more likely to be women).

METADATA t-test p-value
CARBOHYDRATE_PER 0.46
AGE 0.79
TOT_MASS 0.039
PROTEIN_PER 0.16
PLANT_PER 0.97
BMI 0.017
HEIGHT_IN 0.23
SEX 0.0039
FAT_PER 0.93
HEIGHT_OR_LENGTH 0.23
WEIGHT_LBS 0.039
ANIMAL_PER 0.95
LATITUDE 0.55
LONGITUDE 0.82

So, I then collected normal controls matched for BMI and sex.  There were a few migraine subjects missing gender and/or BMI information, so I only included 70 matched controls.  To be fair, this might have not mattered too much: for example, the preliminary American Gut report noted that profiles didn't seem very different between genders.  However, I thought it was best to err on the side of safety.

Unsurprisingly, there were not any substantial clustering between migraine and non-migraine subjects (otherwise, I would have expected this to be included in the American Gut preliminary report):



Unfortunately, I also didn't see any strong clustering based upon migraine frequency either:



FYI, I am showing PCA plots based upon species-level abundances for species with an average abundance of at least 100 counts per sample, but you can also view the case-vs-control phylum and class distributions as well as the migraine frequency phylum and class distributions.

Although I didn't see any major differences in the PCA plots, I went ahead and looked for any differences that occurred with a false-discovery rate (FDR) less than 0.05.  I used the sRAP package for analysis, treating the 16S rRNA abundances like gene expression values.  Although most of these results looked like artifacts from only having one subject that had migraines on a daily basis, I did consider Lactococcus lactis (p=0.00088, FDR = 0.042) to be an interesting candidate:

y-axis is abundance (count-per-thousand) on a log2 scale


I double-checked the frequency of migraine subjects with lactose intolerance: the frequency was qualitatively higher compared to subjects with less frequent migraines (50% versus 28.6%), but this difference was not statistically significant (Fisher's exact p-value = 0.28).  Additionally, I'm not sure about the interpreation if this Lactococcus lactis trend is found to be reproducible in other independent cohorts.  For example, it seems like Lactococcus lactis can regulate riboflavin production (Burgess et al. 2004), which has supposedly been used in the treatment of migraines.  In other words, causality may be hard to distinguish: perhaps subjects with severe migraine problems are taking probiotic supplements.  For example, the majority of American Gut participants reported taking some sort of supplement (I couldn't find a metadata variable specific for probiotics).  In fact, it has been reported that Lactococcus lactis might alleviate symptoms from lactose intolerance (Li et al. 2012), and the individual with daily migraine occurrences was lactose intolerant.

I certainly applaud the work done by the American Gut project: from what I could tell, they were the only major metagenomic consortium that collected migraine metadata.  However, I would feel more confident about these results if there were more subjects who commonly experienced migraines and/or if there was longitudinal data (to track metagenomic profiles during intervals when migraine subjects did or did not experience a migraine).

Of course, I should also point out that I wouldn't consider the analysis presented in this post to be comprehensive.  I would certainly be interested in seeing what other conclusions could be drawn from this data.  In fact, the processed data is publicly available in MG-RAST:

Migraine Subjectshttp://metagenomics.anl.gov/metagenomics.cgi?page=MetagenomeProject&project=6547

Matched Control Subjectshttp://metagenomics.anl.gov/metagenomics.cgi?page=MetagenomeProject&project=6594

If desired, you can also download by tables (normalized to counts per thousand) for phylum, class, and species distributions.
 
Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.