Le moyen le plus rapide d’écrire des données énormes dans un fichier texte Java

Je dois écrire des données énormes dans un fichier texte [csv]. J’ai utilisé BufferedWriter pour écrire les données et il a fallu environ 40 secondes pour écrire 174 Mo de données. Est-ce la vitesse la plus rapide que java peut offrir?

bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) ); 

Remarque: ces 40 secondes incluent le temps d’itération et de récupération des enregistrements du jeu de résultats. 🙂 174 Mo correspondent aux 400 000 lignes du jeu de résultats.

    Vous pouvez essayer de supprimer le BufferedWriter et d’utiliser directement FileWriter. Sur un système moderne, il y a de fortes chances que vous écriviez simplement dans la mémoire cache du lecteur.

    Il me faut entre 4 et 5 secondes pour écrire 175 Mo (4 millions de chaînes) – ceci est sur un Dell 2.4GHz dual-core sous Windows XP avec un disque Hitachi de 80 Go, 7200 tours / min.

    Pouvez-vous isoler combien de temps est la récupération des enregistrements et combien écrit-il?

     import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.io.Writer; import java.util.ArrayList; import java.util.List; public class FileWritingPerfTest { private static final int ITERATIONS = 5; private static final double MEG = (Math.pow(1024, 2)); private static final int RECORD_COUNT = 4000000; private static final Ssortingng RECORD = "Help I am trapped in a fortune cookie factory\n"; private static final int RECSIZE = RECORD.getBytes().length; public static void main(Ssortingng[] args) throws Exception { List records = new ArrayList(RECORD_COUNT); int size = 0; for (int i = 0; i < RECORD_COUNT; i++) { records.add(RECORD); size += RECSIZE; } System.out.println(records.size() + " 'records'"); System.out.println(size / MEG + " MB"); for (int i = 0; i < ITERATIONS; i++) { System.out.println("\nIteration " + i); writeRaw(records); writeBuffered(records, 8192); writeBuffered(records, (int) MEG); writeBuffered(records, 4 * (int) MEG); } } private static void writeRaw(List records) throws IOException { File file = File.createTempFile("foo", ".txt"); try { FileWriter writer = new FileWriter(file); System.out.print("Writing raw... "); write(records, writer); } finally { // comment this out if you want to inspect the files afterward file.delete(); } } private static void writeBuffered(List records, int bufSize) throws IOException { File file = File.createTempFile("foo", ".txt"); try { FileWriter writer = new FileWriter(file); BufferedWriter bufferedWriter = new BufferedWriter(writer, bufSize); System.out.print("Writing buffered (buffer size: " + bufSize + ")... "); write(records, bufferedWriter); } finally { // comment this out if you want to inspect the files afterward file.delete(); } } private static void write(List records, Writer writer) throws IOException { long start = System.currentTimeMillis(); for (Ssortingng record: records) { writer.write(record); } writer.flush(); writer.close(); long end = System.currentTimeMillis(); System.out.println((end - start) / 1000f + " seconds"); } } 

    Essayez les fichiers mappés en mémoire (300 m / s pour écrire 174 Mo dans mon m / c, duo Core 2, 2,5 Go de RAM):

     byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes(); int number_of_lines = 400000; FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel(); ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines); for (int i = 0; i < number_of_lines; i++) { wrBuf.put(buffer); } rwChannel.close(); 

    Seulement pour les statistiques:

    La machine est un ancien Dell avec un nouveau SSD

    CPU: Intel Pentium D 2,8 Ghz

    SSD: SSD Pasortingot Inferno 120Go

     4000000 'records' 175.47607421875 MB Iteration 0 Writing raw... 3.547 seconds Writing buffered (buffer size: 8192)... 2.625 seconds Writing buffered (buffer size: 1048576)... 2.203 seconds Writing buffered (buffer size: 4194304)... 2.312 seconds Iteration 1 Writing raw... 2.922 seconds Writing buffered (buffer size: 8192)... 2.406 seconds Writing buffered (buffer size: 1048576)... 2.015 seconds Writing buffered (buffer size: 4194304)... 2.282 seconds Iteration 2 Writing raw... 2.828 seconds Writing buffered (buffer size: 8192)... 2.109 seconds Writing buffered (buffer size: 1048576)... 2.078 seconds Writing buffered (buffer size: 4194304)... 2.015 seconds Iteration 3 Writing raw... 3.187 seconds Writing buffered (buffer size: 8192)... 2.109 seconds Writing buffered (buffer size: 1048576)... 2.094 seconds Writing buffered (buffer size: 4194304)... 2.031 seconds Iteration 4 Writing raw... 3.093 seconds Writing buffered (buffer size: 8192)... 2.141 seconds Writing buffered (buffer size: 1048576)... 2.063 seconds Writing buffered (buffer size: 4194304)... 2.016 seconds 

    Comme nous pouvons le voir, la méthode brute ralentit la mise en mémoire tampon.

    Votre vitesse de transfert n’est probablement pas limitée par Java. Au lieu de cela, je soupçonnerais (sans ordre particulier)

    1. la vitesse de transfert de la firebase database
    2. la vitesse de transfert sur le disque

    Si vous lisez l’dataset complet et que vous l’écrivez sur le disque, cela prendra plus de temps, car la machine virtuelle Java devra allouer de la mémoire et l’écriture db rea / disk se produira de manière séquentielle. Au lieu de cela, j’écrirais à l’écrivain en mémoire tampon pour chaque lecture que vous effectuez à partir de la firebase database, et l’opération serait donc plus proche d’une opération simultanée (je ne sais pas si vous le faites ou non)

    Pour ces lectures volumineuses de la firebase database, vous souhaiterez peut-être ajuster la taille de récupération de votre relevé. Cela pourrait faire économiser beaucoup de tours à DB.

    http://download.oracle.com/javase/1.5.0/docs/api/java/sql/Statement.html#setFetchSize%28int%29

     package all.is.well; import java.io.IOException; import java.io.RandomAccessFile; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import junit.framework.TestCase; /** * @author Naresh Bhabat * Following implementation helps to deal with extra large files in java. This program is tested for dealing with 2GB input file. There are some points where extra logic can be added in future. Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object. It uses random access file,which is almost like streaming API. * **************************************** Notes regarding executor framework and its readings. Please note :ExecutorService executor = Executors.newFixedThreadPool(10); * for 10 threads:Total time required for reading and writing the text in * :seconds 349.317 * * For 100:Total time required for reading the text and writing : seconds 464.042 * * For 1000 : Total time required for reading and writing text :466.538 * For 10000 Total time required for reading and writing in seconds 479.701 * * */ public class DealWithHugeRecordsinFile extends TestCase { static final Ssortingng FILEPATH = "C:\\springbatch\\bigfile1.txt.txt"; static final Ssortingng FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt"; static volatile RandomAccessFile fileToWrite; static volatile RandomAccessFile file; static volatile Ssortingng fileContentsIter; static volatile int position = 0; public static void main(Ssortingng[] args) throws IOException, InterruptedException { long currentTimeMillis = System.currentTimeMillis(); try { fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles seriouslyReadProcessAndWriteAsynch(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } Thread currentThread = Thread.currentThread(); System.out.println(currentThread.getName()); long currentTimeMillis2 = System.currentTimeMillis(); double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0; System.out.println("Total time required for reading the text in seconds " + time_seconds); } /** * @throws IOException * Something asynchronously serious */ public static void seriouslyReadProcessAndWriteAsynch() throws IOException { ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class while (true) { Ssortingng readLine = file.readLine(); if (readLine == null) { break; } Runnable genuineWorker = new Runnable() { @Override public void run() { // do hard processing here in this thread,i have consumed // some time and eat some exception in write method. writeToFile(FILEPATH_WRITE, readLine); // System.out.println(" :" + // Thread.currentThread().getName()); } }; executor.execute(genuineWorker); } executor.shutdown(); while (!executor.isTerminated()) { } System.out.println("Finished all threads"); file.close(); fileToWrite.close(); } /** * @param filePath * @param data * @param position */ private static void writeToFile(Ssortingng filePath, Ssortingng data) { try { // fileToWrite.seek(position); data = "\n" + data; if (!data.contains("Randomization")) { return; } System.out.println("Let us do something time consuming to make this thread busy"+(position++) + " :" + data); System.out.println("Lets consume through this loop"); int i=1000; while(i>0){ i--; } fileToWrite.write(data.getBytes()); throw new Exception(); } catch (Exception exception) { System.out.println("exception was thrown but still we are able to proceeed further" + " \n This can be used for marking failure of the records"); //exception.printStackTrace(); } } }