How to sum up CSV column without excel

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

rblack
Pro Scripter
Posts: 87
Joined: Sat Dec 22, 2007 12:39 pm

How to sum up CSV column without excel

Post by rblack » Sun Nov 03, 2013 12:20 am

Hi All,

Im stuck with this problem: I have some pretty large CSV files (1mb each), and have to automate suming one precise column (the third one, filled with float numbers).

How to do that? The .CSV has each time different number of rows.

I tried to make a function, which found 2nd and 3rd semicolon, and ripped the text between, but I think the Position> function can't handle the semicolon in Macro Scheduler 11... which I have.

Maybe some SQL trick should be done here? Has anyone encountered such problem? To make things harder, I want to use this script on a server without Microsoft Office installed.
Last edited by rblack on Sun Nov 03, 2013 1:33 pm, edited 1 time in total.

User avatar
JRL
Automation Wizard
Posts: 3518
Joined: Mon Jan 10, 2005 6:22 pm
Location: Iowa

Post by JRL » Sun Nov 03, 2013 3:40 am

You could use CSVFileToArray> then add up the array results of the third column like this.

I created a fake csv file that is 30,000 rows and 10 columns with numbers in the third column. The following script gives me an answer in about 20 seconds. It might be faster if you use SQL but if the computer doesn't have Office installed you won't be able to query a text file. See Marcus' blog for a way to fix that.

Code: Select all

Timer>start
CSVFileToArray>%temp_dir%fakeCSV.csv,var


Let>Total=0
Let>kk=0
Repeat>kk
  Add>kk,1
  Let>Value=var_%kk%_2
  LEt>Total=%Total%+%value%
Until>kk=var_count

Timer>stop

Let>TotalTime=%stop%-%start%

MDl>%Total%%crlf%%TotalTime%

    rblack
    Pro Scripter
    Posts: 87
    Joined: Sat Dec 22, 2007 12:39 pm

    Post by rblack » Sun Nov 03, 2013 1:21 pm

    Thank you JRL! Your answer was very fast. 20 seconds is also fast enough. I need to do that operation twice a day.

    The problem is, that I do not have the function: CSVFileToArray in my Macro Scheduler 12... so maybe it can be done somehow with DBQuery function?

    rblack
    Pro Scripter
    Posts: 87
    Joined: Sat Dec 22, 2007 12:39 pm

    Post by rblack » Sun Nov 03, 2013 1:32 pm

    ah... I have Macro Scheduler 11, sorry - my mistake in spelling.

    User avatar
    JRL
    Automation Wizard
    Posts: 3518
    Joined: Mon Jan 10, 2005 6:22 pm
    Location: Iowa

    Post by JRL » Mon Nov 04, 2013 5:38 am

    I think this will work for column 3 in version 11. The regex was provided by Marcus here

    Code: Select all

    Timer>start
    ReadFile>%temp_dir%fakeCSV.csv,vData
    Separate>vData,crlf,vLine
    
    Let>Total=0
    Let>LL=0
    Repeat>LL
      Add>LL,1
      Let>text=vLine_%LL%
      //RegEx to extract each field to an array
      Let>pattern="[^"\r\n]*"|[^,\r\n]*
      RegEx>pattern,text,0,field,num,0
      //RegEx to remove starting and ending quotes if present
      //RegEx>^"|"$,field_3,0,matches,n,1,,field_3
      Let>Total=%Total%+%field_3%
      Let>field_3=0
    Until>LL=vLine_Count
    
    Timer>stop
    Let>TotalTime=%stop%-%start%
    
    MDl>%Total%%crlf%%TotalTime%%crlf%%vLine_Count%
    

    armsys
    Automation Wizard
    Posts: 1108
    Joined: Wed Dec 04, 2002 10:28 am
    Location: Hong Kong

    Post by armsys » Mon Nov 04, 2013 7:40 am

    Hi JPL,
    Would you please show us a sample script of using SQL to sum up a column in a CSV (text) file? Thanks.

    rblack
    Pro Scripter
    Posts: 87
    Joined: Sat Dec 22, 2007 12:39 pm

    Post by rblack » Mon Nov 04, 2013 11:21 am

    JRL it works like a charm! This is a really pretty code :)

    User avatar
    JRL
    Automation Wizard
    Posts: 3518
    Joined: Mon Jan 10, 2005 6:22 pm
    Location: Iowa

    Post by JRL » Mon Nov 04, 2013 2:45 pm

    @armsys,

    Here's a sample using SQL.

    Code: Select all

    Let>FileLocationPath=%temp_dir%
    Let>CSVFileName=FakeCSV.csv
    //You need a header row in the CSV file.  In this case I just added F1,F2,F3,etc
    Let>ColumnNameToSum=F3
    
    Timer>Start
    GoSub>GetSumFromCSV
    Timer>stop
    Let>TotalTime=%stop%-%start%
    
    MDL>Time to Process = %TotalTime%%crlf%Sum = %SummedResult_1_1%
    
    SRT>GetSumFromCSV
      Let>ConStr=Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=%FileLocationPath%;Extensions=asc,csv,tab,txt;
      DBConnect>ConStr,dbH
      Let>sql=select SUM(%ColumnNameToSum%) from %CSVFileName%;
      DBQuery>dbH,sql,SummedResult,numrecords,numfields
      DBClose>dbH
    END>GetSumFromCSV
    

    armsys
    Automation Wizard
    Posts: 1108
    Joined: Wed Dec 04, 2002 10:28 am
    Location: Hong Kong

    Post by armsys » Mon Nov 04, 2013 8:24 pm

    HI JPL,
    Thanks for your fast help.
    I tried your SQL sample script as follows:

    Code: Select all

    Let>FileLocationPath=C:\Temp\
    Let>CSVFileName=ur_lib_import.txt
    //You need a header row in the CSV file.  In this case I just added F1,F2,F3,etc
    Let>ColumnNameToSum=FileSize
    
    Timer>Start
    GoSub>GetSumFromCSV
    Timer>stop
    Let>TotalTime=%stop%-%start%
    
    MDL>Time to Process = %TotalTime%%crlf%Sum = %SummedResult_1_1%
    
    SRT>GetSumFromCSV
      Let>ConStr=Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=%FileLocationPath%;Extensions=asc,csv,tab,txt;
      DBConnect>ConStr,dbH
      Let>sql=select SUM(%ColumnNameToSum%) from %CSVFileName%;
      DBQuery>dbH,sql,SummedResult,numrecords,numfields
      DBClose>dbH
    END>GetSumFromCSV
    But at line 17, an error message pops up:
    Error in: __debug.dbg
    Line 17 - [Microsoft][ODBC TextDriver] Too few
    parameters. Expected1

    User avatar
    JRL
    Automation Wizard
    Posts: 3518
    Joined: Mon Jan 10, 2005 6:22 pm
    Location: Iowa

    Post by JRL » Mon Nov 04, 2013 8:27 pm

    Does your csv file have column headers? I got that same highly uninformative message before I put in headers.

    armsys
    Automation Wizard
    Posts: 1108
    Joined: Wed Dec 04, 2002 10:28 am
    Location: Hong Kong

    Post by armsys » Fri Nov 08, 2013 1:59 pm

    JRL wrote:Does your csv file have column headers? I got that same highly uninformative message before I put in headers.
    Sorry for my late reply. :oops:
    The CSV text:

    Code: Select all

    URL,FileName,FilePath,FileExt,FileDateModified,FileDateCreated,FileSize,FileHash
    "C:\Developing Large Web Applications.pdf","Developing Large Web Applications.pdf","C:","pdf","11/7/2013 12:15:14 PM","11/7/2013 12:14:34 PM","1941715","2bededce623e705a8b1f398f318a5c530e3432537a8254f8187b2cbd981ff735"
    The purpose is total the "FileSize" of, say, 1,000 documents.

    User avatar
    JRL
    Automation Wizard
    Posts: 3518
    Joined: Mon Jan 10, 2005 6:22 pm
    Location: Iowa

    Post by JRL » Fri Nov 08, 2013 2:29 pm

    Armsys,

    Took your data and created "armsys.csv" in my temp directory. The following script gives me the correct answer.

    Code: Select all

    Let>FileLocationPath=%temp_dir%
    Let>CSVFileName=armsys.csv
    //You need a header row in the CSV file.  In this case I just added F1,F2,F3,etc
    Let>ColumnNameToSum=FileSize
    
    Timer>Start
    GoSub>GetSumFromCSV
    Timer>stop
    Let>TotalTime=%stop%-%start%
    
    MDL>Time to Process = %TotalTime%%crlf%Sum = %SummedResult_1_1%
    
    SRT>GetSumFromCSV
      Let>ConStr=Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=%FileLocationPath%;Extensions=asc,csv,tab,txt;
      DBConnect>ConStr,dbH
      Let>sql=select SUM(%ColumnNameToSum%) from %CSVFileName%;
      DBQuery>dbH,sql,SummedResult,numrecords,numfields
      DBClose>dbH
    END>GetSumFromCSV
    

    armsys
    Automation Wizard
    Posts: 1108
    Joined: Wed Dec 04, 2002 10:28 am
    Location: Hong Kong

    Post by armsys » Fri Nov 08, 2013 9:52 pm

    JRL wrote:Took your data and created "armsys.csv" in my temp directory. The following script gives me the correct answer.
    The same "Microsoft ODBC text driver too few parameters expected 1" error message pops up. I suspect it's likely caused by my ODBC driver installation. Once I find out the answer, I'll let you know.
    JPL, thanks for taking time to validate the script.

    armsys
    Automation Wizard
    Posts: 1108
    Joined: Wed Dec 04, 2002 10:28 am
    Location: Hong Kong

    Post by armsys » Sat Nov 09, 2013 1:59 pm

    JRL,
    After 2 days of repeated experiments, I succeed.
    My test CSV can be simplified as follows:

    Code: Select all

    A1,A2
    1,2
    3,4
    My new source:

    Code: Select all

    Timer>Start
    GoSub>GetSumFromCSV
    Timer>stop
    Let>TotalTime=%stop%-%start%
    MDL>Time to Process = %TotalTime%%crlf% Records = %num_recs%  %num_fields%  Sum = %Sum_1_1%
    SRT>GetSumFromCSV
      Let>ConStr=Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=C:\ArmSys\ODBC\;
      DBConnect>ConStr,dbH
      Let>sql=select SUM(%ColumnNameToSum%)from armsys.txt
      DBQuery>dbH,select SUM(A1) from armsys.txt,Sum,num_recs,num_fields
      DBClose>dbH
    END>GetSumFromCSV
    For 10,000 records, it takes 76ms. Not too bad.
    JRL, thanks for your help. :lol:

    UncleBen
    Newbie
    Posts: 6
    Joined: Thu Oct 02, 2014 12:22 pm

    Re: How to sum up CSV column without excel

    Post by UncleBen » Thu Oct 02, 2014 12:57 pm

    Hi there, I've got a similar problem. I do not have Office installed. I would like to sum up the seventh column in my csv file. This column doesn't have any caption. The solution that you provided breaks when there is a semicolon inside the double quoted in text any column before the seventh column. Here is an example.

    Code: Select all

    2012-01-01;2012-01-01;"column3";"column4";"column5";"column6";"6,60";"USD";"140,60";USD
    2012-01-01;2012-01-01;"column3";"column4";"column5";"column6";"50,00";"USD";"134,00";USD
    2012-01-01;2012-01-01;"column3";"column4";"column5";"[b]column6 with ;semicolon inside[/b]";"43,13";"USD";"84,00";USD
    2012-01-01;2012-01-01;"column3";"column4";"column5";"column6";"40,87";"USD";"40,87";USD
    Can someone help me to construct a RegEx that will work also when there is no semicolon inside the quoted text and will work also when the text is separeted by a comma not semicolon?

    Post Reply
    Sign up to our newsletter for free automation tips, tricks & discounts