Reading file takes too long
My application starts by parsing a ~100MB file from the SD card and takes minutes to do so. To put that in perspective, on my PC, parsing the same file takes seconds.
I started by naively implementing the parser using Matcher and Pattern, but DDMS told me that 90% of the time was spent computing regular expression. And it took more than half an hour to parse the file. The pattern is ridiculously simple, a line consists of:
ID (a number)
LANG (a 3-to-5 character string) DATA (the rest)
I decided to try and use String.split. It didn’t show significant improvements, probably because this function might use regular expression itself. At that point I decided to rewrite the parser entirely, and ended up on something like this:
protected Collection
doInBackground( Void... params ) {
BufferedReader reader = new BufferedReader( new FileReader( sentenceFile ) );
String currentLine = null;
while ( (currentLine = reader.readLine()) != null ) {
treatLine( currentLine, allSentences );
}
reader.close();
return allSentences;
}
private void treatLine( String line, CollectionallSentences ) {
char[] str = line.toCharArray();
// ...
// treat the array of chars into an id, a language and some data
allSentences.add( new Sentence( id, lang, data ) );
}
And I noticed a huge boost. It took minutes instead of half-an-hour. But I wasn’t satisfied with this so I profiled and realized that a bottleneck was BufferedReader.readLine. I wondered: it could be IO-bound, but it also could be that a lot of time is taken filling up an intermediary buffer I don’t really need. So I rewrote the whole thing using FileReader directly:
protected Collection
doInBackground( Void... params ) {
FileReader reader = new FileReader( sentenceFile );
int currentChar;
while ( (currentChar = reader.read()) != -1 ) {
// parse an id
// ...
// parse a language
while ( (currentChar = reader.read()) != -1 ) {
// do some parsing stuff
}
// parse the sentence data
while ( (currentChar = reader.read()) != -1 ) {
// parse parse parse
}
allSentences.add( new Sentence( id, lang, data ) );
}
reader.close();
}
And I was quite surprised to realize that the performance was super bad. Most of the time is spent in FileReader.read, obviously. I guess reading just a char costs a lot.
Now I am a bit out of inspiration. Any tip?
Read more
stackoverflow.comm
No comments:
Post a Comment