• Home
  • Causal Mediation
  • Functional Mediation Analysis
  • CV
  • Blog
  • Papers
Yenny Webb-Vargas
Contact Me

Finding the needle in the haystack with Mendeley and Evernote

4/29/2014

0 Comments

 
This is my love note for two free programs that manage metadata: Mendeley and Evernote. 

Mendeley is a bibliography app that has a social network revolving around publications. I use it mainly for maintaining a library with the pdf version of the all the useful papers I have come across, and it stores all its publication information too. 

Evernote is an app with the slogan 'remember everything'. It manages and stores notes, which can include html formatted text, pdf files, and images, as well as have attached any file(!). Evernote can also take audio, screenshots, webcam notes, as well as handwritten notes taken using a tablet. The first three can hold all the features I described, while the handwritten notes cannot.

The thing about these two is that they store a huge amount of entries and have the power to let me find what I am looking for. But let me be more specific: 
  1. They can search within the pdf documents and within notes. In Mendeley, you can add notes, highlight text, and add notes to the highlighted text. In Evernote, you can add pdfs to your notes.
  2. They manage, store, and index meta-data. To me, meta-data is comprised of tags, folder structure, and added characteristics of an object (like the title, authors, year of publication of an article in pdf.) Right now, as I am writing this in my desktop Evernote console, I have tagged this note as 'blog' and 'blog idea'. It will pop up when I subset using the tags, and I can include these tags in my search. In Mendeley, I can subset using the tags that accompany the required readings for the causal inference class for which I am a teaching assistant, and since I made a tag for each lecture topic, it is quite easy for me to find those articles. Also, tags allow for tracking objects that can be in two different categories at the same time, folders do not allow for this (thanks to Gmail and Google docs for explaining this to me.) My favorite use of notes on highlighted text in Mendeley is to mark statements that are directly applicable to my research, like 'this theorem can be used in _____ paper'. Consistency in the usage of tags is crucial when searching.
  3. Automatic meta-data. Mendeley extracts meta-data from the pdfs AND compliments it by web search in Google Scholar. A well-groomed library, for me, is one that has all the information from the articles and book chapters, webpages and working papers, and with Mendeley, most of the job is done automatically. When it is groomed, it is much easier to get the correct bibliography from Mendeley. It can automatically create bibtex files of your library and has an app to build bibliography in Microsoft Word. As for Evernote, there is a webclipper app that, when you clip, it automatically sets the folder and tags that match the content of the clipped webpage (and you can remove them if they are not appropriate.) Yeah, Evernote learns what you do.
  4. They are backed up on the cloud. Also, they are integrated with iOS and Android. Mendeley is developing their Android application, but there are other already-developed apps that use Mendeley's online storage of my library. The quality of the integration between Android Evernote and Desktop Evernote comes down to how fast it can sync between the two of them (if you update notes simultaneously on the desktop and in your phone, it now shows conflicting notes and keeps both versions).

Post a comment if you know of another program that works great with meta-data, specially if it is one about file structure in the drive. I would love it if Dropbox/Cubby/Copy had something like this too. As far as I can tell, Google Drive cannot handle meta-data, although it doesn't seem to need it that much given its search capabilities.


0 Comments

Data ScienceS

1/24/2014

0 Comments

 

I have had many discussions with my peers about how we perceive 'data science' and how it compares to what we do in biostatistics. I would like to share my perspective on the issue. Mainly, how there are many data sciences, and (what I think we call) data science and biostatistics are two examples.

In biostatistics, we develop and apply methods that help us learn what happens when people are exposed to something (epidemiology), when we put people in an MRI machine and ask them to tap their fingers or think unhappy thoughts (neuroimaging), when we apply a treatment (clinical trials, biomedical experiments, any kind of experiment…), among many other situations. This list comes from the types of open problems that I have seen at medical and public health schools.

But really, these methods that crunch data into knowledge can be (and are) applied in any field of knowledge. Think econometrics, chemometrics, biometrics*, psychometrics; these are all fields that measure something and apply statistics.

So, statistics helps us gain knowledge from data. It provides a platform in which we can put our assumptions on paper and state clearly the question we really want to ask. Given our straight forward question, and our assumptions (which represent the way we think the world works), we come out with an answer. Does smoking cause cancer? Is emotional pain similar to physical pain? Is HRT associated with more heart disease? This is applying statistics.

Now, statisticians develop methods that can be used in any quantitative field, but so can economists who have trained deeper in statistics (some people call them closet statisticians, I bet they prefer 'econometricians'.) Moreover, statistics quantifies the uncertainty in our answers, which is why it is used in many fields. It also measures how good a method is (think about properties of an estimator like consistency, its variance, its distribution.) The kick about biostatistics is that it develops methods for biomedical research.

On the other hand, data science comes forward with new methods (or old ones) to apply to data that has become at hand recently, like internet traffic data or data derived from small online businesses. Nevertheless, I think the heart of what it does is very close to what applying statistics is. They both want to use data to answer questions about the world. The difference is just like the difference between econometrics and chemometrics. Each field has different goals. I don't know about economics, but I don't think they focus on determining precisely the parts (and their quantities) that constitute a solution, which is one of chemometrics's goal. Each field has different methods, but they also can share plenty of them. The instrumental variable framework developed in economy can really help in neuroimaging.

Each field has its own way of doing the actual experiment, which involves collecting data and processing it. So, just as one titrates samples to get a measurement of how much chlorine there is in such solution, just as one has to interview people and measure them for a cohort study, there is a way to gather information derived from webservices. Knowing how to get how many people see this post and other posts by my friends, or how many groupons for grooming are available for Baltimore, or how to actually code an ab-test, are golden skills to have in the new field 'data science' (check out the Coursera Data Science specialty offered by Hopkins, and Jeff Leek's comment on it.)

My point is: they are all data sciences. Among the *metrics, we could rename data science as 'webmetrics™'. Scratch that because somebody else already registered it. Let's call it e-metrics, which could include all the data that is being collected by smartphones. Wait, 'e-metric™' was a trademark abandoned in 2002, can we use it? And then there is 'businessmetrics' (I am just making up words). With my zero knowledge on finance, I am tempted to just call it finance.

PS. Loved this post by Rachel Schutt on ab-testing and causal inference, it goes through data science, ab-tests, randomization, and a bit of causal inference.

PS2. Also, this post by Kathy Orlinsky on the study of pain perception, which describes further the study that could answer if emotional pain was similar to physical pain. This post, though, did not cite her use of the awesome hyperbole and a half blogpost.

* The wikipedia page on biometrics misses the big picture on what is biometrics.

0 Comments

    Author

    Yenny Webb-Vargas
    Biostatistics PhD Student
    Johns Hopkins Bloomberg School of Public Health

    Archives

    April 2014
    January 2014

    Categories

    All

    RSS Feed

    Tweets por @YennyWebbV
Powered by Create your own unique website with customizable templates.