Glisten uses computer vision to break down product photos to their most important parts

INSUBCONTINENT EXCLUSIVE:
It amazing that in this day and age, the best way to search for new clothes is to click a few check boxes and then scroll through endless
pictures
Why can''t you search for &green patterned scoop neck dress& and see one? Glisten is a new startup enabling just that by using computer
vision to understand and list the most important aspects of the products in any photo. Now, you may think this already exists
In a way, it does — but not a way that helpful
Co-founder Sarah Wooders encountered this while working on a fashion search project of her own while going to MIT. I was procrastinating by
shopping online, and I searched for v-neck crop shirt, and only like two things came up
But when I scrolled through there were 20 or so,& she said
&I realized things were tagged in very inconsistent ways — and if the data is that gross when consumers see it, it probably even worse in
the backend. As it turns out, computer vision systems have been trained to identify, really quite effectively, features of all kinds of
images, from identifying dog breeds to recognizing facial expressions
When it comes to fashion and other relatively complex products, they do the same sort of thing: Look at the image and generate a list of
features with corresponding confidence levels. So for a given image, it would produce a sort of tag list, like this: As you can imagine,
that actually pretty useful
But it also leaves a lot to be desired
The system doesn''t really understand what &maroon& and &sleeve& really mean, except that they&re present in this image
If you asked the system what color the shirt is, it would be stumped unless you manually sorted through the list and said, these two things
are colors, these are styles, these are variations of styles, and so on. That not hard to do for one image, but a clothing retailer might
have thousands of products, each with a dozen pictures, and new ones coming in weekly
Do you want to be the intern assigned to copying and pasting tags into sorted fields? No, and neither does anyone else
That the problem Glisten solves, by making the computer vision engine considerably more context-aware and its outputs much more useful. Here
the same image as it might be processed by Glisten system: Better, right? Our API response will be actually, the neckline is this, the
color is this, the pattern is this,& Wooders said. That kind of structured data can be plugged far more easily into a database and queried
with confidence
Users (not necessarily consumers, as Wooders explained later) can mix and match, knowing that when they say &long sleeves& the system has
actually looked at the sleeves of the garment and determined that they are long. The system was trained on a growing library of around 11
million product images and corresponding descriptions, which the system parses using natural language processing to figure out what
referring to what
That gives important contextual clues that prevent the model from thinking &formal& is a color or &cute& is an occasion
But you&d be right in thinking that it not quite as easy as just plugging in the data and letting the network figure it out. Here a sort of
idealized version of how it looks: There a lot of ambiguity in fashion terms and that definitely a problem,& Wooders admitted, but far
from an insurmountable one
&When we provide the output for our customers we sort of give each attribute a score
So if it ambiguous, whether it a crew neck or a scoop neck, if the algorithm is working correctly it&ll put a lot of weight on both
If it not sure, it&ll give a lower confidence score
Our models are trained on the aggregate of how people labeled things, so you get an average of what people opinion is. The model was
initially aimed at fashion and clothing in general, but with the right training data it can apply to plenty of other categories as well —
the same algorithms could find the defining characteristics of cars, beauty products and so on
Here how it might look for a shampoo bottle — instead of sleeves, cut and occasion you have volume, hair type and paraben
content. Although shoppers will likely see the benefits of Glisten tech in time, the company has found that its customers are actually two
steps removed from the point of sale. What we realized over time was that the right customer is the customer who feels the pain point of
having messy unreliable product data,& Wooders explained
&That mainly tech companies that work with retailers
Our first customer was actually a pricing optimization company, another was a digital marketing company
Those are pretty outside what we thought the applications would be. It makes sense if you think about it
The more you know about the product, the more data you have to correlate with consumer behaviors, trends and such
Knowing summer dresses are coming back, but knowing blue and green floral designs with 3/4 sleeves are coming back is better. Glisten
co-founders Sarah Wooders (left) and Alice Deng Competition is mainly internal tagging teams (the manual review we established none of us
would like to do) and general-purpose computer vision algorithms, which don''t produce the kind of structured data Glisten does. Even ahead
of Y Combinator demo day next week the company is already seeing five figures of monthly recurring revenue, with their sales process
limited to individual outreach to people they thought would find it useful
&There been a crazy amount of sales these past few weeks,& Wooders said. Soon Glisten may be powering many a product search engine online,
though ideally you won''t even notice — with luck you&ll just find what you&re looking for that much easier. (This article originally had
Alice Deng quoted throughout when in fact it was Wooders the whole time — a mistake in my notes
It has also been updated to better reflect that the system is applicable to products beyond fashion.) WTF is computer vision?