Improve speed of insertion of big amount of data

Refresh

April 2019

Views

76 time

1

I have a rest service that take xml with 400_000 records, each record contain the following fields: code,type,price.

In DB (MySql )I have table named PriceData with 2_000_000 rows. The purpose of this rest is: select all PriceDatas from DB according to code,type from XML, replace price of each PriceData with price from XML, if there is no PriceData with this code,type create new with provided price.

Now it work as : select one PriceData from DB accroding to first record from XML, set new price or create new PriceData, save PriceData and these steps repeats 400_000 times.(It takes about 5 minutes)

I want to speed up this process.

First try:

Select 1000 elements step by step from PriceData, and when all elements will be selected update them:

Code:

private void updateAll(final List<XmlData> prices/*data from xml*/) {
        int end= 1000;
        int begin= 0;
        final List<PriceData> models = new ArrayList<>();
        while(end != prices.size() || begin !=end){
           models.addAll(dao.findByPrices(prices.subList(begin,end)));
           begin = end;
           end +=1000;

       }


        final Map<String,XmlData> xmlData= prices.stream()                         
        .collect(Collectors.toMap(this::keyForPriceDate,e->e));

        final Map<String,PriceData> modelMap = models.stream()
                .collect(Collectors.toMap(this::keyForRowModel,e->e));          
        final List<PriceData> modelsToSave = new ArrayList<>();

        for(final String key : xmlData.keySet()){
            final XmlData price = xmlData.get(key);
            PriceData model = modelMap.get(key);

            if(model == null){
                model = onEmptyPriceData(price);
            }
            model.setPrice(price.getPrice());
            modelsToSave.add(model);
        }
        modelService.saveAll(modelsToSave);
    }

I convert two lists to maps to know does PriceData exist (keys for xmlData and modelMap created as (code+type))

findByPrices method create query in following format

select * from PriceData where (code =123 and type ='qwe') or (...)//and this `Or` repeats 1000 times

Now it takes 2 minutes.

Second try:

Select all PriceData from db (2 millions)

and use the algorithm above

It takes 3 minutes. First try is better but in future my rest can take 500_000 and I want to know which try will be better in this scenario or maybe there is the better way to do this task.

My select method

public List<PriceData> findBy(final List<XmlData> selectData) {
        final StringBuilder query = new StringBuilder("SELECT * from PriceData ");    
        query.append("WHERE \n");
        final Iterator<PriceRowSelectData> selectDataIterator = selectData.iterator();
        while(selectDataIterator.hasNext()){
            final PriceRowSelectData data = selectDataIterator.next();
            query.append("( \n")
                 .append("productCode = "+ data.getProductId()+" \n")
                 .append(" AND type = "+ data.getPriceind()+" \n")
                 .append(" ) \n");
            if(selectDataIterator.hasNext()){
                query.append("OR \n");
            }
        }
        final SearchResult<PriceRowModel> searchRes = search(query.toString());
        /*
         Here i use custom mapper that map list of result to my object
        */
        return searchRes.getResult();
    }

1 answers

0

You should use the MySQL INSERT ... ON DUPLICATE KEY UPDATE statement, combined with JDBC batch processing. This of course assumes that code,type is the primary key, or at least a unique index.

private void updateAll(final List<XmlData> prices) throws SQLException {
    String sql = "INSERT INTO PriceData (code, type, price)" +
                " VALUES (?,?,?)" +
                " ON DUPLICATE KEY" +
                " UPDATE price = ?";
    try (PreparedStatement stmt = this.conn.prepareStatement(sql)) {
        int batchSize = 0;
        for (XmlData price : prices) {
            if (batchSize == 1000) { // flush batch every 1000
                stmt.executeBatch();
                batchSize = 0;
            }
            stmt.setInt       (1, price.getCode());
            stmt.setString    (2, price.getType());
            stmt.setBigDecimal(3, price.getPrice());
            stmt.setBigDecimal(4, price.getPrice());
            stmt.addBatch();
            batchSize++;
        }
        if (batchSize != 0)
            stmt.executeBatch();
    }
}

You can twiddle the batch size, but not flushing will use a lot of memory. I think 1000 statements per batch is good, but I have no numbers backing that.