Error when using Convolutional Layer: Message: New shape length doesn't match original length

FriedemannFriedemann Member, University Professor Posts: 27 University Professor
Hi,

I am trying to setup a simple Deep Learning process using the Deep Learning extension and the MNIST dataset as CSVs of grey values from Kaggle. If I just use two fully connected layers inside the Deep Learning operator everything works, but as soon as I add a convolutional layer and a pooling layer, the apply model steps fails with an error message:

Exception: org.nd4j.linalg.exception.ND4JIllegalStateException
Message: New shape length doesn't match original length: [0] vs [6584816]. Original shape: [8399, 784] New Shape: [33601, 0, 784]

The test dataset is the result of a split operator which is used to have 80% (33601 records)  of the data as training data and 20% (8399 records) as test data.

What am I doing wrong? Any help is highly appreciated
Friedemann

Best Answer

  • MateMate Employee, Member Posts: 14 RM Team Member
    edited April 2021 Solution Accepted
    Well, I have never used a dataset where 3-channel (e.g.: RGB) images were put into a single array, but I did conduct a little test now:

    this is one of the MNIST images reduced to 4x4 and converted to 3-channel image (I stopped the process and looked directly at the data, how those MNIST png images look like under the hood, so this is now not the ExampleSet use-case, but rather the tutorial process where we deal with actual images files):

    that means, the tensor is 4D because we are always talking about a collection of samples (in this case a single image), and the 3D tensor inside will contain 2D matrices for each channel/depth.
    That means, this is the desired format which I'd like to get to, if I do convert my manual data set to a 3-channel one, as I did for 1-channel in my previous message.

    3 channels:
    [[0, 1.0000, 0, 0, 1.0000, 0, 0, 1.0000, 0, 0, 2.0000, 0, 0, 2.0000, 0, 0, 2.0000, 0, 0, 3.0000, 0, 0, 3.0000, 0, 0, 3.0000, 0]]

    becomes

    [[[[         0,    1.0000,         0], 
       [         0,    1.0000,         0], 
       [         0,    1.0000,         0]], 

      [[         0,    2.0000,         0], 
       [         0,    2.0000,         0], 
       [         0,    2.0000,         0]], 

      [[         0,    3.0000,         0], 
       [         0,    3.0000,         0], 
       [         0,    3.0000,         0]]]]

    So, this is a 3x3x3 image (height: 3, width: 3, depth/channel: 3).

    As you can see, if you put your data in a row, in the right order (first channel, second channel, third channel) and do set the correct Input Shape parameter for the network operator, you can even deal with multi-channel images sitting in a single ExampleSet row.

Answers

  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor

  • MateMate Employee, Member Posts: 14 RM Team Member
    edited March 2021
    Hi Friedemann,

    as a first suggestion, I'd say, have a look at one of our tutorial processes, since it also deals with the MNIST use-case.
    "Add Convolutional Layer --> Tutorial Process --> MNIST classification"
    (of course we do not provide the data, but you already have it so, that's no problem for you)

    The error btw occurs due to the fact that a fully-connected layer is fed into a convolutional one.
    My intuition tells me, that this shouldn't even be automatically possible since CNN-s work on multi-dimensional data, right ?
    So, the very least we should do for making a potential automatic transformation possible, is to configure how to shape the 1-dimensional output of given fully-connected layer thus making it digestible for an upcoming convolutional layer, shouldn't we ?
    At least I think so.
    + I am pretty sure, we don't provide that functionality and I guess it would also be a bit strange and unusual, at least based on what I've seen in other CNN models (with the help of the "Import Existing Model" you can check out the architectures of numerous famous models).

    Kind regards,
    Mate
  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor
    Well, I think that I do not fully understand the response. But anyway let me answer step by step:
    The tutorial process is based on the tensor version of Deep Learning and, honestly, I do not know how to create tensor data (and that is not part of the tutorial). 
    Furthermore, I wonder why the training of the model works but the application fails. I would expect an error during the training phase if the CNN needs two-dimensional data.
    Finally, your answer implies that using the csv format of the MNIST data will not work with the tutorial process and a different representation is needed.Can you point me to the correct input format?
  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor
    Follow-Up on previous entry: I do understand now that you need the image extension in order to run the Deep Learning tutorial process. The image extension generates the tensor format from a directory of images and the name of the sub-directories is used as the label. I have downloaded the jpg-Version of the images and created the folder structure and I am currently runnning the training.
  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor
    2nd follow-up: The tutorial process finished successfully after 45 minutes (CUDA does not work yet). I have used the training set of images for both training and test and get an overall accuracy of 9.97% - basically all images are classified as "9".
  • MateMate Employee, Member Posts: 14 RM Team Member
    edited April 2021
    Well, let me address all 3 posts:

    1. Yes, the tutorial process uses a different "format" (as you also concluded that in your follow-up response, so this part of my original answer becomes irrelevant since you are using CSV format or at least you were using that).

      I'm not sure if the training worked, though. Can you see training scores in your log window while doing the training ? Sometimes instead of an immediate error, you'll only get warnings informing you that the network could not be trained in the given epoch.

      My answer was rather implying that your network was not going to work, regardless of the "format" of your data:
      putting a Dense Layer in front of a Convolutional Layer is problematic. That was what I intended to point out.

    2. You could have sticked with your original data-format (1D, 28x28 columns/attributes/features for a single image), but would have needed to correctly devise your network then.

    3. I'll be honest with you, I only trained and tested on a reduced set.
      Took 100 images from each class thus creating a 1000-image large, distinct dataset for training and testing respectively.
      Since I also used CPU for training I did not want to deal with 60.000 images.
      When I did this, the tutorial process achieved ~89% accuracy, training only taking 1 minute 20 secs (I'm on a laptop).
      Even if I then apply the model to the entire test dataset (around 10.000 images), it achieves around 90% accuracy. I'd say the model learnt pretty successfully how to distinguish those numbers (of course it can still be made more robust and better, I think LeNet achieves around 97%, if I remember it correctly).

    Kind regards,
    Mate
  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor
    edited April 2021
    Thanks for the update. So, how did you do the sampling based on images?

    Btw, I did use the csv format with a deep learning network consisting of three fully connected layers and achieved an overall accuracy of approx. 97% (using the split operator and 20% test data).

    As a researcher I would like to map a non-image problem on the image recognition case. However, I would need to creat "images" from the data first before being able to use the CNN layer, right?
  • pschlunderpschlunder Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research
    one way of sampling is to use the "Read Image Meta-Data" operator followed by a sampling operator. The first one only reads in the paths of images, so you can e.g. sample it stratified to gain around 100 images per class as in the demo process attached. The actual images will later than be loaded, when using the "Pre-process images" operator to normalize and transform them into a tensor.

    If you want to apply a convolutional network or something similar on non-image data, you can use one of the other tensor conversion operators as well. E.g. you can use the "ExampleSet to Tensor" operator. It requires two columns specifying the IDs for the tensor object to be created. That way you can use all RapidMiner operators to modify data as ExampleSets and later convert it into a tensor object. Direct changes on a tensor object are not possible as of now, though.

    Furthermore please have a look at our docu's page section on data, maybe it helps clarify a few things https://docs.rapidminer.com/latest/studio/installation/deep-learning-extension.html#data

    I hope this helps,
    Philipp






  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor
    edited April 2021
    Hi Philipp,

    thanks for the explanation of image sampling. Got that!

    It tried to use the example set to tensor operator but the CNN layer complains regarding the tensor format (it generates a tensor for time series data rather than the format the CNN layer expects).It seems that an option for "CNN data" is missing.
    Did you ever use the operator successfully with a CNN layer?

    Cheers
    Friedemann
  • MateMate Employee, Member Posts: 14 RM Team Member
    edited April 2021
    Hey, just tried this process, seems to be training and scoring (although, pretty badly with the current settings):
    [see attachment]


    CNN using 1D input (784 + 1 columns).
    Mind the Input Shape setting on the Deep Learning model operator.

  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor
    edited April 2021
    Hi Mate,
    thanks a lot for the response. Now we have confused two aspects:
    • One question was how to use CNNs with the csv version of the MNIST data.
      That has been answered by your response. In a previous response I was told that I do need two-dimensional data for a CNN but it seems that if you use the "manual input shape specification" the 1D version of the DL4J package is invoked.
      Update:
      I read the explanation of "Convolutional Flattened" multiple times now. I guess my understanding was incorrect. The explanation says that the four dimensional input is converted to two dimensions. What kind of four dimensional input is expected? The CSV input is 2-dimensional (label, pixel_values). Do I need data like in the next question (label, image, pixel_values_of_line) and the 2D kernel will be invoked?
    • The actual question in this thread is how to prepare a tensor from an example set containing data like for sequential patterns for usage with a "regular" 2-D CNNs (label, image, pixel_values_of_line)?
      I guess that I need to select the "Convolutional" shape option but which format of the example set is expected then?
    Cheers
    Friedemann
  • MateMate Employee, Member Posts: 14 RM Team Member
    Since, I also got confused for a second and I am the one who probably confused you here, let me try to make up for that now.

    • When I wrote "CNN-s work on multi-dimensional data", I was probably a bit off, since that is not entirely accurate. In case of signal processing (and I believe that is the original, mathematical concept of convolution) 1D convolution is possible/available, even for DL4J and btw we also support that if you choose the Recurrent input type(s).

      However, when it comes to multidimensional signals, like images and videos, 2D and 3D convolutions take over.
      I think this is what I was referring to when I said, you'd need multi-dimensional data, if you were to use convolutional layers for image processing.

      Now, in my above process, the 2D convolutional layer is being used and that was what I was previously also looking for, but got confused by the documentation. Hence "configure how to shape the 1-dimensional output of given fully-connected layer thus making it digestible for an upcoming convolutional layer". Except that now I removed the fully-connected/dense layer and just needed to make the 1D data directly digestible for the convolutional one.
      I basically told DL4j, that "hey, this a 784 elements in an array, but I want to have it as 28x28. I literally just reshaped my data before consuming it in the convolutional layer. Under the hood this works by using pre-processors, in this case: FeedForwardToCnnPreProcessor.

    • If you want to deal with sequential data, now that's a bit of a different story, I guess.
      I think for multivariate time-series involving convolutional layers we don't necessarily have a tutorial process, but that is certainly possible, especially with the right selection of input shape. Although, in this case, I'd definitely use one of the recurrent options.

    Kind regards,
    Mate
  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor
    Hi Mate,

    that's great progress! So, what you are saying is that the 784 pixel values are converted in to a 2d matrix of 28 x 28. That's somehow what I am looking for (and I hope that this conversion is part of the model and will be used when the model is applied). However, I am not sure how the format of the input data has to be. Does the conversion assume that the 2d matrix was "flattened" row-wise or column-wise?  
    Follow-up questions:
    Can I use value pairs for the data "points" rather than just single values by setting the depth value to 2?
    Any chance for a supported input format that consists of one line per row of the input matrix?

    Cheers
    Friedemann
  • MateMate Employee, Member Posts: 14 RM Team Member
    edited April 2021
    Yes, it (the pre-processor I mentioned) is part of the architecture/model.
    I am not sure if I understand the rest of your question, but here is an example what is happening in the background:

    Original data, a collection of rows (~array of arrays, this corresponds to an ExampleSet with a single row):
    [[         0,    1.0000,         0,         0,    1.0000,         0,         0,    1.0000,         0]]

    Now said pre-processor takes the above data and reshapes it into this:
    [[[[         0,    1.0000,         0], 
       [         0,    1.0000,         0], 
       [         0,    1.0000,         0]]]]

    Let's say the above "image" stands for the number 1.
    Number of channels / depth was set to 1 (in my sample process as well), since our data is grayscale.

    Now this format can directly be processed by the convolutional layer.
  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor
    So, the data is interpreted row-wise. The first three values represent the first row, values 4 to 6 represent the second row and so on.

    Now, my question is what happens if you have a pair of values  per "pixel". Or, in general, what is expressed by the depth parameter and how does it affect the interpretation of the example set.

  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor
    Perfect! Thanks a lot for the clarification. The actual question has been answered.
    If I have pairs of values for each "pixel", I will handle that similar to a color channel and each channel is just following in the sequence of pixel values.
    Just for curiosity because this format will cause very long lines in the csv file and thus will be hard to read by human beings: Are thre any plans to support further "text formats" for tensors with grouping concepts based on attributes?   

  • FriedemannFriedemann Member, University Professor Posts: 27 University Professor
    Now the reality check:
    When running the MNIST example (CSV Version) with the manual input shape setting using the CUDA backend I get an error message saying: Failed to allocate 1,685,712,896 bytes from HOST memory.

    When running with the CPU backend, my machine simply runs out of memory and starts swapping (JVM memory is restricted to 10,000 MB and backend memory to 10G).

    I guess that something is very wrong with the FeedForwardToCnnPreProcessor.



Sign In or Register to comment.