Saturday, August 29, 2015

Study delivers bleak verdict on validity of psychology experiment results by Ian Sample

Of 100 studies published in top-ranking journals in 2008, 75% of social psychology experiments and half of cognitive studies failed the replication test

Psychology experiments are failing the replication test – for good reason

A major investigation into scores of claims made in psychology research journals has delivered a bleak verdict on the state of the science.
An international team of experts repeated 100 experiments published in top psychology journals and found that they could reproduce only 36% of original findings.
The study, which saw 270 scientists repeat experiments on five continents, was launched by psychologists in the US in response to rising concerns over the reliability of psychology research.
“There is no doubt that I would have loved for the effects to be more reproducible,” said Brian Nosek, a professor of psychology who led the study at the University of Virgina. “I am disappointed, in the sense that I think we can do better.”
“The key caution that an average reader should take away is any one study is not going to be the last word,” he added. “Science is a process of uncertainty reduction, and no one study is almost ever a definitive result on its own.”
All of the experiments the scientists repeated appeared in top ranking journals in 2008 and fell into two broad categories, namely cognitive and social psychology. Cognitive psychology is concerned with basic operations of the mind, and studies tend to look at areas such as perception, attention and memory. Social psychology looks at more social issues, such as self esteem, identity, prejudice and how people interact.
In the investigation, a whopping 75% of the social psychology experiments were not replicated, meaning that the originally reported findings vanished when other scientists repeated the experiments. Half of the cognitive psychology studies failed the same test. Details are published in the journal Science.
Even when scientists could replicate original findings, the sizes of the effects they found were on average half as big as reported first time around. That could be due to scientists leaving out data that undermined their hypotheses, and by journals accepting only the strongest claims for publication.
Despite the grim findings, Nosek said the results presented an opportunity to understand and fix the problem. “Scepticism is a core part of science and we need to embrace it. If the evidence is tentative, you should be sceptical of your evidence. We should be our own worst critics,” he told the Guardian. One initiative now underway calls for psychologists to submit their research questions and proposed methods to probe them for review before they start their experiments.
John Ioannidis, professor of health research and policy at Stanford University, said the study was impressive and that its results had been eagerly awaited by the scientific community. “Sadly, the picture it paints - a 64% failure rate even among papers published in the best journals in the field - is not very nice about the current status of psychological science in general, and for fields like social psychology it is just devastating,” he said.
But he urged people to focus on the positives. The results, he hopes, will improve research practices in psychology and across the sciences more generally, where similar problems of reproducibility have been found before. In 2005, Ioannidis published a seminal study that explained why most published research findings are false.
Marcus Munafo, a co-author on the study and professor of psychology at Bristol University, said: “I think it’s a problem across the board, because wherever people have looked, they have found similar issues.” In 2013, he published a report with Ioannidis that found serious statistical weaknesses were common in neuroscience studies.
Nosek’s study is unlikely to boost morale among psychologists, but the findings simply reflect how science works. In trying to understand how the world works, scientists must ask important questions and take risks in finding ways to try and answer them. Missteps are inevitable if scientists are not being complacent. AsAlan Kraut at the Association for Psychological Science puts it: “The only finding that will replicate 100% of the time is likely to be trite, boring and probably already known: yes, dead people can never be taught to read.”
There are many reasons why a study might not replicate. Scientists could use a slightly different method second time around, or perform the experiment under different conditions. They might fail to find the original effect by chance. None of these would negate the original finding. Another possibility is that the original result was a false positive.
Among the experiments that stood up was one that found people are equally adept at recognising pride in faces from different cultures. Another backed up a finding that revealed the brain regions activated when people were given fair offers in a financial game. One study that failed replication claimed that encouraging people to believe there was no such thing as free will made them cheat more.
Munafo said that the problem of poor reproducibility is exacerbated by the way modern science works. “If I want to get promoted or get a grant, I need to be writing lots of papers. But writing lots of papers and doing lots of small experiments isn’t the way to get one really robust right answer,” he said. “What it takes to be a successful academic is not necessarily that well aligned with what it takes to be a good scientist.”

Wednesday, August 12, 2015

Health Experts Say Coca-Cola is Funding its Own Science to Deliberately Mislead the Public by George Dvorsky

The Global Energy Balance Network— a research institute supported by Coca-Cola—is claiming that exercise, and not diet, is the best way to prevent weight gain. It’s a dubious and self-serving message that’s not going over well amongst diet and obesity experts.
“Most of the focus in the popular media and in the scientific press is that they’re eating too much, eating too much, eating too much, blaming fast food, blaming sugary drinks and so on,” claims Dr. Steve Blair at the Global Energy Balance Network website. “And there’s really virtually no compelling evidence that that in fact is the cause. Those of us interested in science, public health, medicine, we have to learn how to get the right information out there.”
As reported at CBC News, Blair’s extraordinary claim, along with an accompanying video, recently caught the attention of Ottawa-based obesity expert Dr. Yoni Freedhoff. During his ensuing investigation to find the origin of these claims, he discovered that the network is receiving financial and logistical support from Coca-Cola, which isn’t something that was previously disclosed. Alarmed, Freedhoff contacted Anahad O’Connor from The New York Times to get the word out.
In his ensuing article, “Coca-Cola Funds Scientists Who Shift Blame for Obesity Away From Bad Diets,” O’Connor writes:
Health experts say this message is misleading and part of an effort by Coke to deflect criticism about the role sugary drinks have played in the spread of obesity and Type 2 diabetes. They contend that the company is using the new group to convince the public that physical activity can offset a bad diet despite evidence that exercise has only minimal impact on weight compared with what people consume.
This clash over the science of obesity comes in a period of rising efforts to tax sugary drinks, remove them from schools and stop companies from marketing them to children. In the last two decades, consumption of full-calorie sodas by the average American has dropped by 25 percent.
“Coca-Cola’s sales are slipping, and there’s this huge political and public backlash against soda, with every major city trying to do something to curb consumption,” said Michele Simon, a public health lawyer. “This is a direct response to the ways that the company is losing. They’re desperate to stop the bleeding.”
Over at Scientific American, Dina Fine Maron spoke to diet and behavior expert Charlotte Markey to learn if people can lose weight with exercise alone. Here’s what Markey had to say:
I find everything going on here very troubling. In the promotional video from Coke’s group, linked to by the NYT, exercise scientist Steve Blair says we don’t know what is causing obesity and we need more research. That message is oversimplified and terribly misleading. We actually know a great deal about what leads to obesity. It’s not a great mystery. People are eating too much and not exercising enough…that makes it inevitable that people will be obese. The group’s emphasis on physical activity is misleading based on what the data shows. There’s no data to support saying if you exercise for 30 minutes three times a week that this will take care of the problem. We have data refuting that.
In reality, we need people to stop drinking sugary beverages like soda. Soda is the one consumable beverage that is repeatedly cited as having the biggest impact on obesity rates. From a public health standpoint, we want soda out of schools and we want cities to really decrease intake of soda—and Coca-Cola knows this and knows they are being proactive and defensive against taxes on soda and other limitations.
Very disturbing. This issue bears a startling resemblance to the efforts of cigarette manufacturers to deliberately mislead the public about the health risks of smoking.

Sunday, March 29, 2015

Dozens Of Scientific Papers Withdrawn After Peer-Review Fraud Uncovered by Stephen Luntz

Scientific Publisher BioMed Central has withdrawn 43 papers, and is investigating many more, over what it calls the “fabrication” of peer reviews. Representatives of Journal editors have admitted the papers are the tip of a dangerous iceberg, and the scandal may to lead to an overhaul of how peer review is conducted.
Peer review is fundamental to science, a central part of the process of self-correction that sets it aside from faith-based systems. True peer review does not end with publication;plenty of scientific papers are published only to subsequently be shown to have major flaws. However, the initial process whereby editors of scientific publications send work, usually anonymized, to other researchers for checking is meant to filter out the worst mistakes.
That failed for a number of the 277 journals BioMed Central publishes, with researchers finding ways to review their own papers, or those of friends. The problem may be far more widespread, and BioMed Central may be ahead of the curve in picking the issue up.
The Committee on Publication Ethics (COPE) issued a statement saying they have, “become aware of systematic, inappropriate attempts to manipulate the peer review processes of several journals across different publishers.” COPE started out as an effort by a small group of medical journal editors to raise the standards of academic publication. It now has a membership of 9000 editors from across academic fields, and its growth is indicative of concerns about the challenges facing the peer review process.
According to the COPE release, “These manipulations appear to have been orchestrated by a number of third party agencies offering services to authors.”
This is not the first example of a “peer review and citation ring”. Sixty papers were withdrawn last year as a result of a similar discovery. However, those papers were restricted to a single journal. This time the problem seems to be far more widespread.
So far all the papers withdrawn by BioMed Central have had authors based in China, often at leading institutions such as China Medical University, but BioMed Central's says the problem is an international one reflecting the pressure researchers are under to publish quickly.
Concerns about peer review have been growing for decades. Reviewers are almost always already struggling under the burden of their own research and teaching load. Few are paid for their efforts, and even fewer get credit from their employers for this vital contribution to the advancement of science. Many admit off the record to not giving papers the attention they deserve.
Although bad papers can send scientific research off down blind alleys, but bad work more often gets ignored by others in the field. However, authors can be rewarded with funding that should have gone to someone else. Media outlets, whether mainstream or science-specific, usually have no choice but to rely on publication in a peer reviewed journal as the test of whether work justifies publicity.
When fraud is exposed it can have devastating consequences for innocent co-workers, and is fodder for science's enemies.
In these cases, since the research was medical-related, there is also the danger of treatments being approved for clinical use on the basis of flawed studies.

Tuesday, February 3, 2015

Dictionary of Research Concepts and Issues

The e-Pub version of this dictionary to research students free of charge.

This is in e-Pub format and will run on a wide range of devices from tablets to laptops to desk top computers. It has been developed to work with which may be easily downloaded for free.

If any of your students would like an e-copy of this book please advise them to complete the form at and e-mail it to us and a free e-copy of the book will be e-mailed to him or to her.

Sunday, February 1, 2015

Test shows big data text analysis inconsistent, inaccurate by Kevin Fogarty

Big data analytic systems are reputed to be capable of finding a needle in a universe of haystacks without having to know what a needle looks like.

Even the simplest part of that process – sorting all the data available into Haystacks and Not Haystacks so the analytics can at least work with data that is relevant – requires a topical analysis that uses the metadata accompanying each giant pile of data to classify each bit according to topic as well as source, format and other criteria.

The very best ways to sort large databases of unstructured text is to use a technique called Latent Dirichlet allocation (LDA) – a modeling technique that identifies text within documents as belonging to a limited number of still-unknown topics, groups them according to how likely it is that they refer to the same topic, then backtracks to identify what those topics actually are. (Here's the full explanation in the Journal of Machine Learning Research; here's Wikipedia's. )

LDA is "the state of the art in topic modeling, according to analysis published Thursday in the American Physical Society's journal Physical Review X, which said that, in the 10 years since its introduction, LDA had become one of the most common ways to accomplish the computationally difficult problem of classifying specific parts of human language automatically into a context-appropriate category.

Unfortunately, LDA is also inaccurate enough at some tasks that the results of any topic model created with it are essentially meaningless, according to Luis Amaral, a physicist whose specialty is the mathematical analysis of complex systems and networks in the real world and one of the senior researchers on the multidisciplinary team from Northwestern University that wrote the paper.

The team tested LDA-based analysis with repeated analyses of the same set of unstructured data – 23,000 scientific papers and 1.2 million Wikipedia articles written in several different languages.

Even worse than being inaccurate, the LDA analyses were inconsistent, returning the same results only 80 percent of the time even when using the same data and the same analytic configuration.

Accuracy of 90 percent with 80 percent consistency sounds good, but the scores are "actually very poor, since they are for an exceedingly easy case," Amaral said in an announcement from Northwestern about the study.

Applied to messy, inconsistently scrubbed data from many sources in many formats – the base of data for which big data is often praised for its ability to manage – the results would be far less accurate and far less reproducible, according to the paper.

"Our systematic analysis clearly demonstrates that current implementations of LDA have low validity," the paper reports (full text PDF here).

The team created an alternative method called TopicMapping, which first breaks words down into bases (treating "stars" and "star" as the same word), then eliminates conjunctions, pronouns and other "stop words" that modify the meaning but not the topic, using a standardized list.

Then the algorithm builds a model identifying words that often appear together in the same document and use the proprietary Infomap natural-language processing software to assign those clusters of words into groups identified as a "community" that define the topic. Words could appear in more than one topic area.

The new approach delivered results that were 92 percent accurate and 98 percent reproducible, though, according to the paper, it only moderately improved the likelihood that any given result would be accurate.

The real point was not to replace LDA with TopicMapping, but to demonstrate that the topic-analysis method that has become one of the most commonly used in big data analysis is far less accurate and far less consistent than previously believed.

The best way to improve those analyses, according to Amaral, is to apply techniques common in community detection algorithms – which identify connections among specific variables and use those to help categorize or verify the classification of those that aren't clearly in one group or another.

Without that kind of improvement – and real-world testing of the results of big data analyses – companies using LDA-based text analysis could be making decisions based on results whose accuracy they can't know for sure.

"Companies that make products must show that their products work," Amaral said in the Northwestern release. "They must be certified. There is no such case for algorithms. We have a lot of uninformed consumers of big data algorithms that are using tools that haven't been tested for reproducibility and accuracy."

Thursday, January 29, 2015

New algorithm can separate unstructured text into topics with high accuracy and reproducibility by Emily Ayshford

Much of our reams of data sit in large databases of unstructured text. Finding insights among emails, text documents, and websites is extremely difficult unless we can search, characterize, and classify their text data in a meaningful way.

One of the leading big data algorithms for finding related topics within unstructured text (an area called topic modeling) is latent Dirichlet allocation (LDA). But when Northwestern University professor Luis Amaral set out to test LDA, he found that it was neither as accurate nor reproducible as a leading topic modeling algorithm should be.

Using his network analysis background, Amaral, professor of chemical and biological engineering in Northwestern's McCormick School of Engineering and Applied Science, developed a new topic modeling algorithm that has shown very high accuracy and reproducibility during tests. His results, published with co-author Konrad Kording, associate professor of physical medicine and rehabilitation, physiology, and applied mathematics at Northwestern, were published Jan. 29 in Physical Review X.

Topic modeling algorithms take unstructured text and find a set of topics that can be used to describe each document in the set. They are the workhorses of big data science, used as the foundation for recommendation systems, spam filtering, and digital image processing. The LDA topic modeling algorithm was developed in 2003 and has been widely used for academic research and for commercial applications, like search engines.

When Amaral explored how LDA worked, he found that the algorithm produced different results each time for the same set of data, and it often did so inaccurately. Amaral and his group tested LDA by running it on documents they created that were written in English, French, Spanish, and other languages. By doing this, they were able to prevent text overlap among documents.
"In this simple case, the algorithm should be able to perform at 100 percent accuracy and reproducibility," he said. But when LDA was used, it separated these documents into similar groups with only 90 percent accuracy and 80 percent reproducibility. "While these numbers may appear to be good, they are actually very poor, since they are for an exceedingly easy case," Amaral said.

To create a better algorithm, Amaral took a network approach. The result, called TopicMapping, begins by preprocessing data to replace words with their stem (so "star" and "stars" would be considered the same word). It then builds a network of connecting words and identifies a "community" of related words (just as one could look for communities of people in Facebook). The words within a given community define a topic.

The algorithm was able to perfectly separate the documents according to language and was able to reproduce its results. It also had high accuracy and reproducibility when separating 23,000 scientific papers and 1.2 million Wikipedia articles by topic.

These results show the need for more testing of big data algorithms and more research into making them more accurate and reproducible, Amaral said.

"Companies that make products must show that their products work," he said. "They must be certified. There is no such case for algorithms. We have a lot of uninformed consumers of big data algorithms that are using tools that haven't been tested for reproducibility and accuracy."

Friday, January 9, 2015

Top 10 Clever Google Search Tricks by Whitson Gordon

10. Use Google to Search Certain Sites

If you really like a web site but its search tool isn't very good, fret not—Google almost always does a better job, and you can use it to search that site with a simple operator. For example, if you want to find an old Lifehacker article, just type before your search terms (e.g. hackintosh). The same goes for your favorite forums, blogs, and even web services. In fact, it's actually really good for finding free audiobookssearching for free stuff without the spam, and more.

9. Find Product Names, Recipes, and More with Reverse Image Search

Google's reverse image search is great if you're looking for the source of a photo, wallpaper, or more images like that. However, reverse image search is also great for searching out information—like finding out who makes the chair in this picture, or how do I make the meal in this photo. Just punch in an image like you normally would, but look at Google's regular results instead of the image results—you'll probably find a lot.

8. Get "Wildcard" Suggestions Through Autocomplete

A lot of advanced search engines let you put a * in the middle of your terms to denote "anything." Google does too, but it doesn't always work the way you want. However, you can still get wildcard suggestions, of a sort, by typing in a full phrase in Google and then deleting the word you want to replace. For example, you can search for how to jailbreak an iphoneand remove one word to see all the suggestions for how to ____ an iphone.

7. Find Free Downloads of Any Type

Ever needed an old Android app but couldn't find the APK for what you were looking for? Or wanted an MP3 but couldn't find the right version? Google has a few search tools that, when used together, can unlock a plethora of downloads: inurlintitle, and filetype. For example, to find free Android APKs, you'd search for -inurl:htm -inurl:html intitle:"index of" apk to see site indexes of stored APK files. You can use this to find Android appsmusic filesfree ebookscomic books, and more. Check out the linked posts for more information.

6. Discover Alternatives to Popular Sites, Apps, and Products

You've probably searched for comparisons on Google before, like roku vs apple tv. But what if you don't know what you want to compare a product too, or you want to see what other competitors are out there? Just type in roku vs and see what Google's autocomplete adds. It'll most likely list the most popular competitors to the roku so you know what else to check out.You can also search for better than roku to see alternatives, too.

5. Access Google Cache Directly from the Search Bar

We all know Google Cache can be a great tool, but there's no need to search for the page and then hunt for that "Cached" link: just type cache: before that site's URL (e.g. cache: If Google has the site in its cache, it'll pull it right up for you. If you want to simplify the process even more, this bookmarklet is handy to have around. It's great for seeing an old version of a page, accessing a site when it's down, or getting past something like the SOPA blackout.

4. Bypass Paywalls, Blocked Sites, and More with a Google Proxy

You may already know that you can sometimes bypass paywalls, get around blocked sites, and download files by funneling a site through Google Translate or Google Mobilizer. That's a clever search trick in and of itself, but just like Google Cache, you can make the process a lot faster bykeeping a few URLs on hand. Just add the URL you want to visit to the end of the Google URL (e.g. you're good to go. Check out the full list of proxies, along with bookmarklets to make them even easier, here.

3. Search for People on Google Images

Some people's names are also real-world objects—like "Rose" or "Paris." If you're looking for a person and not a flower, just search for rose and add to &imgtype=facethe end of your search URL, as shown above. Google will redo the search but return results that it recognizes as faces!
Update: Reader unclghost kindly pointed out that we're working with outdated information here—this trick is now built into Google's UI! Just head to Search Tools > Type and you can choose from faces, photos, clip art, line drawings, and even animations. Thanks for the tip!

2. Get More Precise Time-Based Search Results

You've probably seen the option in Google that lets you filter results by time, such as the past hour, day, or week. But if you want something more specific—like in the past 10 minutes—you can do so with a URL hack. Just add &tbs=qdr: to the end of the URL, along with the time you want to search (which can include h5 for 5 hours, n5 for 5 minutes, or s5 for 5 seconds (substituting any number you want). So, to search within th past 10 minutes, you'd add&tbs=qdr:n10to your URL. It's handy for getting the most up-to-the-minute news.

1. Refine Your Search Terms with Advanced Operators

Okay, so this isn't so much a "clever use" than it is a tool everyone should have in their pocket. For everything Google can do, so few of us actually use the tools at our disposal. You probably already know you can search multiple terms with AND or OR, but have you ever used AROUND? AROUND is a halfway point between regular search terms (like white teeth) and using quotes (like "white teeth"). AROUND(2), for example, ensures that the two words are close to each other, but not necessarily in a specific order. You can tweak the range with a higher or lower number in the parentheses.
Similarly, if you want to exclude a word entirely, you can add a dash before it—like justin bieber -sucks if you want sites that only speak of Justin Bieber in a positive light. You can also use this to exclude other parameters—like excluding a site you don't like (troubleshooting mac Check out our guide to tweaking your Google searches for more of these tips, and you can also find a pretty solid list over at weblog Marc and Angel Hack Life. Search on!