Friday, August 2, 2013

[android help] Reading file takes too long


Reading file takes too long



My application starts by parsing a ~100MB file from the SD card and takes minutes to do so. To put that in perspective, on my PC, parsing the same file takes seconds.


I started by naively implementing the parser using Matcher and Pattern, but DDMS told me that 90% of the time was spent computing regular expression. And it took more than half an hour to parse the file. The pattern is ridiculously simple, a line consists of:



ID (a number) LANG (a 3-to-5 character string) DATA (the rest)


I decided to try and use String.split. It didn’t show significant improvements, probably because this function might use regular expression itself. At that point I decided to rewrite the parser entirely, and ended up on something like this:



protected Collection doInBackground( Void... params ) {
BufferedReader reader = new BufferedReader( new FileReader( sentenceFile ) );

String currentLine = null;
while ( (currentLine = reader.readLine()) != null ) {
treatLine( currentLine, allSentences );
}

reader.close();
return allSentences;
}

private void treatLine( String line, Collection allSentences ) {
char[] str = line.toCharArray();

// ...
// treat the array of chars into an id, a language and some data

allSentences.add( new Sentence( id, lang, data ) );
}


And I noticed a huge boost. It took minutes instead of half-an-hour. But I wasn’t satisfied with this so I profiled and realized that a bottleneck was BufferedReader.readLine. I wondered: it could be IO-bound, but it also could be that a lot of time is taken filling up an intermediary buffer I don’t really need. So I rewrote the whole thing using FileReader directly:



protected Collection doInBackground( Void... params ) {
FileReader reader = new FileReader( sentenceFile );
int currentChar;
while ( (currentChar = reader.read()) != -1 ) {
// parse an id
// ...

// parse a language
while ( (currentChar = reader.read()) != -1 ) {
// do some parsing stuff
}

// parse the sentence data
while ( (currentChar = reader.read()) != -1 ) {
// parse parse parse
}

allSentences.add( new Sentence( id, lang, data ) );
}

reader.close();
}


And I was quite surprised to realize that the performance was super bad. Most of the time is spent in FileReader.read, obviously. I guess reading just a char costs a lot.


Now I am a bit out of inspiration. Any tip?



Read more

stackoverflow.comm



No comments:

Post a Comment

Google Voice on T-Mobile? [General]

Google Voice on T-Mobile? So I recently switched from a GNex on Verizon to a Moto X DE on T-Mobile. I had always used Google Voice for my v...