Spontaneous speech is an important source of information for aphasia research. It is essential to collect the right amount of data: enough for distinctions in the data to become meaningful, but not so much that the data collection becomes too expensive or places an undue burden on participants. The latter issue is an ethical consideration when working with participants that find speaking difficult, such as speakers with aphasia. So, how much speech data is enough to draw meaningful conclusions? How does the uncertainty around the estimation of model parameters in a predictive model vary as a function of the length of texts used for training?