LATEST NEWS Deep breakthrough by Rice University scientists

Deep breakthrough by Rice University scientists

-

- Advertisement -

In an earlier article on in-depth learning we discussed how distraction workloads – the use of already trained neural networks to analyze data – can run on reasonably cheap hardware, but executing the training workload that the neural network “learns” is orders of size more expensive.

In particular, the more potential input you have for an algorithm, the more out of your scale problem you get when analyzing the problem space. This is where MACH, a research project written by Tharun Medini of Rice University and Anshumali Shrivastava, enters. MACH is an acronym for Merged Average Classifiers via Hashing, and according to lead researcher Shrivastava, “[his] training times are about 7-10 times faster, and … memory impressions are 2-4 times smaller” than those of previous large-scale deep learning techniques.

In describing the extent of extreme classification issues, Medini refers to online shopping searches and notes that “there are easily more than 100 million products online.” In any case, this is conservative – one data company claimed that only Amazon sold US 606 million individual products, while the entire company offered more than three billion products worldwide. Another company estimates the American product at 353 million. Medini continues: “A neural network that takes search input and predicts 100 million outputs or products will usually end up with around 2,000 parameters per product. So you multiply that, and the final layer of the neural network is 200 billion parameters. I am talking about a very, very deadly simple neural network model. “

On this scale, a supercomputer probably needs terabytes of memory to save the model. The memory problem becomes even worse when you display GPUs. GPUs can process neural network workloads faster than general-purpose CPUs, but each GPU has a relatively small amount of RAM – even the most expensive Nvidia Tesla GPUs have only 32 GB of RAM. Medini says: “Training such a model is priceless because of the enormous communication between GPUs.”

- Advertisement -

Instead of training on the full 100 million results – product purchases, in this example – Mach divides them into three “buckets”, each containing 33.3 million randomly selected results. MACH is now creating another ‘world’ and in that world the 100 million results are again randomly sorted into three buckets. Crucial is that the random sorting is separated into World One and World Two – they each have the same 100 million results, but their random division into buckets differs for each world.

With each world that is started, a search is performed for both a “world one” classification and a “world two” classification, with only three possible results each. “What is this person thinking of?” Shrivastava asks. “The most likely class is something that is common between these two buckets.”

At the moment there are nine possible outcomes: three buckets in World One times three buckets in World Two. But MACH only needed to create six classes – the three buckets from World One plus the three buckets from World Two – to model that search space with nine outcomes. This advantage improves as more “worlds” are created; a three-world approach yields 27 results from just nine created classes, a four-world setup produces 81 results from 12 classes, and so on. “I pay a linear fee and I get an exponential improvement,” says Shrivastava.

Better yet, MACH lends itself better to distributed computing on smaller individual copies. The worlds “don’t even have to talk to each other,” says Medini. “In principle, you could train every [world] on a single GPU, which you could never do with a non-independent approach.” In the real world, the researchers applied MACH to a 49 million product Amazon training database and sorted them randomly into 10,000 buckets in each of 32 separate worlds. That reduced the required parameters in the model by more than an order of magnitude – and according to Medini required training of the model both less time and less memory than some of the best reported training times on models with similar parameters.

This would of course not be an Ars article about deep learning if we didn’t end it with a cynical reminder of unintended consequences. The unspoken reality is that the neural network does not learn to show shoppers what they asked for. Instead, it learns how to make a purchase from searches. The neural network does not know or does not care what people were actually looking for; it only has an idea of ​​what that person is most likely to buy – and without adequate supervision, systems that are trained to increase the odds in this way can eventually propose baby products to women who have had a miscarriage or worse.

- Advertisement -

Leave a Reply

Latest News

The Trump administration proposed scrapping Michelle Obama’s healthy school lunch policy on her birthday

On Friday, Michelle Obama celebrated her 56th birthday.That same day, the USDA announced proposed rules that would roll back one of her signature policies as First Lady: healthier school lunches.The new rules allow for looser rules regarding fruits and vegetables served, and provide for a la carte meal options.Those in favor say the rules are…

A mysterious and lethal infection from China could have infected 35 times more people than main overalls, researchers caution

Concerns are mounting over a new, potentially fatal, respiratory virus that is spreading in Wuhan, China.Chinese officials said Saturday that 45 people have contracted the virus, four more than the last figure released.According to scientists in London, this is likely far fewer than the real total.Using statistical modelling, the academics from Imperial College London suggested…

Camila Cabello ‘OK, Boomer’- ed James Corden In A 1999 Vs. 2019 Riff-Off

(Theo Wargo/Getty Images for iHeartMedia) Camila Cabello feels strongly that today's pop music is better than ever, and she's willing to battle it out with a good ol' friendly 1999 vs. 2019 riff-off. Last night (January 16), James Corden made a bold statement on his show, claiming that today's music doesn't have any "staying power."…

You might also likeRELATED
Recommended to you