Coding Challenge #49: Photo Mosaic with White House Social Media Images



In this Coding Challenge, I use a collection of Obama Administration’s facebook images to create a “photo mosaic” of President Obama with Processing (Java).

My Coding Challenge about social media data visualization with the White House data: https://youtu.be/UrznYJltZrU

The White House Social Media Data is available here: https://www.whitehouse.gov/blog/2017/01/05/new-lenses-first-social-media-presidency

The data and source code can be found with ITP “Obamathon” github…

Related Posts

36 Comments

  1. How long did your program need to draw your final Obama image?
    I programmed this myself in a slightly different way and with vanilla Java (8).
    I am replacing every pixel of the image to be with a corresponding image with the same avg color (So i created an image with the final amount of pixels (Goal image width and height * subimage width / height) and filled them according to the subimages). But this takes a massive amount of time on my Laptop.

    With the following parameters it takes roughly one hour, even with threading.
    Final image = 200 * 200 "px" with every "pixel" being replaced by a 30 * 30 pixel image. So the final image is (200 * 30)^2 pixels

  2. I'm thinking load one image at a time, find which pixel in the main imahe most closely matches its brightness, then store both that filename and brightness level in a 2D array for each pixel (or 1D just left to right top to bottom). Then if there's already a picture chosen for that pixel, compare if it's a better match than the one that's already there. Should save the only one copy of each image and the memory issue. Only load the end result of chosen pictures all at the same time. If You see missing pixels, then maybe add in the ability when the pixel already has a match, to propogate the worse of the 2 (or if they're equal just the second match) to its next closest match maybe repeat x number of times for each image. I'd imagine after 5 or 6 iterations there'd be no pixels left unfilled.

  3. You only need 256 images, so you could load images by little groups, calculate the brithness of on images, and add it to a Dictionnary where the key is the brightness and the value is the image, only if the key doesn't exit. So you would read the images folder until you fully load your dictionnary or until no images remains. At the end you would only have 256 images loaded 🙂

  4. with the color you just would have to get all the color values, get the average and if the main obama picture has a pixel that's near that average, put the image there

  5. I wrote a program years ago doing something similar. Your video made me think about the problem again and try out a couple of things. Here's what I found:
    First, I want to define some terms that I will use in the descriptions below:
    – tile, a rather small, square picture that is used as picture element in the output picture.
    – target picture, a downscaled version of the picture to match. It's the "target" for the matching algorithm.

    [edit: in the links below, I don't know what language the site uses for you, but if you click on the button in the center of the bigger pictures, the full resolution is loaded.]

    1) We need the tiles to be square and all the same size, so that really screams for preprocessing…

    2) Dealing with color is really difficult, because to do it properly you have to deal with color spaces etc. But I found that for this purpose, it's OK enough to assume linear color spaces even though most of them are actually non-linear. So let's assume that the difference of e.g. an intensity in red between 226 and 227 is exactly as high as the difference of an intensity in red between 120 and 121. Also assume that a difference in e.g. the red channel affects the color just as much as the same difference in the blue or green channel (which is also not true in most color spaces).
    Then every color can be represented as a point in the three-dimensional color space with red, green and blue as its axes. The "error" of a pixel color is then the distance between the point for the pixel color in the target picture and [edit: the point for] the pixel color in the tile that should match. The distance is (under the assumption of linear color space): err = square root (dr*dr + dg*dg + db*db), as described by the Pythagorasean theorem, where dx is the intensity difference in color channel x.

    3) The picture that we want to produce gets downscaled such that for every tile we want to have in the output picture, there's one pixel in the target picture, like so: https://ibb.co/n3hgkS
    Every tile gets downscaled to 1×1 pixels.
    Then for every pixel in the target, we calculate the error (color difference) for each 1×1 tile. We pick the one with the lowest error value, but draw in the output picture the normal-sized version (e.g. 80×80 pixels or so). We then get something like: https://ibb.co/kxAj5S

    4) As you can see, the result is pretty good, but we can do much better! First, we can improve the matching by treating tiles not as one pixel of a certain color (its average color), but as a cluster of 2×2 or 4×4 pixels. This way, we can really exploit the fact that many pictures have a different basic shade in the top (or left) than the bottom (or right). I found that using too high resolutions here is no good because performance gets really bad and single pixels get matched but the general color is off. And the idea is to match areas of different colors, not to match features. E.g. a bike in the picture won't match anything in particular in the target anyway, because most probably the perspective is wrong or it's translated within the picture etc. (This is probably the most controversial sentence in this post, but I'll leave it at that (: ) The "matching error" is then the sum of all pixel errors of one tile. One remark though: Before we could get away with not taking the square root in the formula above, because it does not affect the order of the error values, but now the square root is important because we sum up afterwards. To get an output of the same size as before, we need to have a higher resolution version as input: https://ibb.co/jNHgkS
    So far, so good. With a tile matching 4×4 pixels in the target, we get something like this: https://ibb.co/isUizn

    5) Much better already, but there's still whole areas in the output picture that use the same tile over and over again, so let's not use the single best match, but a random tile of the best n matches. I found n=10 is pretty good (but that depends on the number of tiles the algorithm has at its disposal). In fact, let's not pick one with uniform distribution, but prefer tiles with a better match. I used a very simple approach that behaves quite nicely: choose two random values within the range and pick the smaller one (smaller indexes in the array refer to smaller errors in my implementation). With this algorithm, the complexity for choosing a random value is still constant, and higher indexes are picked with linearly decreasing probability.
    This gives us something like: https://ibb.co/ka2Tzn

    Finally, the results obviously get better the more tile pictures you provide. Soon, inspecting every tile for every pixel (cluster) in the target is impractical. With about 36000 tiles, the matching for something about the dimensions of the examples above took over 15min. Whereas now, the matching with 54000 tiles took only 20sec. That's the matching only, not the assembly and writing to disk of the output picture. So what did I do? I created a database that contains for every tile the best n matches for similar tiles. And I use the top 50. The matching starts with testing against 50 tiles that are most different from each other (sampled beforehand). The idea is to get the general direction, is this a red area or a blue one, is ot rather dark or bright? Then the algorithm looks at the "neighboring" tiles of the winner (looked up from the database) to find a better match. And continues to do so as long as a neighbor is better than the current best. With this approach, on average only ~170 tiles are actually tested until the final match was found. That's 170 out of 54000. The creation of the database took quite some time, though, and actually, I sought for the best 1000 matches and created a reduced-to-50 version for testing. But the results have been so convincing that I stuck with it.

    What I think would improve results further, is to block used tiles for its neighbor positions only. But that's still to be done…

  6. Hi guys, im making a mosaic for my mum for christmas, and i need a clue or hint before i continue on. How could you use a horizontal slider to show more of an image? Eg- the pic is 900 x 700, but the size of the processing window is 450 x 350. I think i would have to map the value of the slider to the picture width, and change the pixels in the window by the value of that slider…. i think. I dont wabt the answer, but if i could get a hint that would be great. Or should i just go to a forum? Love your videos, im learning alot

  7. NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION NULL POINTER EXCEPTION

  8. Thanks Daniel, for the positive way you share your approach of coding, what to do from a start and what is open next with p5 and processing using it. I keep on track everyday whatever the subject, cause i find it always amuzing and a start of doing some other stuff using your material. Somehow there's plenty of fancy stuff to do, it's a never ending discovery, i just like that.Thanks Daniel.

  9. Hei I really like your videos. But sometimes your imperative style of coding may be very confusing. I would suggest using a more functional approach. Because in your videos you're dealing with a lot of arrays, average, sums, loops (PLEASE GET RID OF FOR LOOP PLEEEEASE), maps and stuff and a more declarative way of programming would not so messy and better to understand. Especially for people who are not that familiar with coding.

Leave a Reply

Your email address will not be published.

© 2022 Code As Pro - Theme by WPEnjoy · Powered by WordPress