Society’s ability to generate data at scale is well ahead of its ability to interpret it at the same rate, but a new company is changing that by looking at an old tool in a new way.
Launched in early May at the Oscon conference in Austin, Texas, Pilosa is a new generation of technology that decouples the index from data storage and optimizes it for massive scale by deploying a bitmap index on high-cardinality data, CEO Higinio Maycotte said.
Society is generating new data at a rate much faster than Moore’s Law. That volume makes it harder to interpret data at scale, as data retrieval technology has fallen beside that which generates it.
“It’s going to solve a major problem for everyone who works with data sets of one terabyte or more,” Mr. Maycotte said. “Pilosa makes a terabyte of data respond to queries as if it were 10 megabytes.”
[caption id="attachment_52772" align="alignright" width="273"]
Pilosa processing speeds[/caption]Scientific research involving proteins is a data intensive area, Mr. Maycotte said. Most existing models can only accommodate a small fraction of the actual proteins in the human body but scientists can employ Pilosa’s models and capture the entire data set.
“Genomic analyses can be completed in orders of magnitude faster,” Mr. Maycotte said.
Mr. Maycotte used the simple example of determining my favorite shirt colors. Pilosa turns that into a question by assigning a 1 or 0 to my like or dislike of every color. The binary system is highly compressible. Should someone want to know the favorite shirt colors of thousands or millions of people, this can easily be determined, along with a host of related factors.
“We want to sit on top of some of the largest data sets in the world,” Mr. Maycotte said. “Our pilot projects include moonshot initiatives like cancer research. Joining and asking questions of multiple whole genomes simultaneously is exactly the kind of work Pilosa was built to help accomplish.”