Dealing with odd characters

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
User avatar
Phil Pendlebury
Automation Wizard
Posts: 543
Joined: Tue Jan 16, 2007 9:00 am
Contact:

Dealing with odd characters

Post by Phil Pendlebury » Sun Sep 08, 2024 1:11 pm

Hi all,

I am trying to read a file and I can only get the first word, and even that doesn't match when I try to use it in and IF statement.

I think this is due to special characters in the file which in Notepad++ for example they show as
NUL
SUB
NULNULNULNUL
etc.

Trying to paste any of the file here shows a garbled mess. Completely different to what show sin NP++

Any thoughts on this?
Phil Pendlebury - Linktree

User avatar
Grovkillen
Automation Wizard
Posts: 1115
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: Dealing with odd characters

Post by Grovkillen » Sun Sep 08, 2024 1:52 pm

Can't you upload some example to an online service? Hard to tell without looking at the characters themselves.

Or just try to escape them.
viewtopic.php?f=8&t=10911&p=46412&hilit=Encode#p46602
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Phil Pendlebury
Automation Wizard
Posts: 543
Joined: Tue Jan 16, 2007 9:00 am
Contact:

Re: Dealing with odd characters

Post by Phil Pendlebury » Sun Sep 08, 2024 2:07 pm

Sure. Here you go:

https://we.tl/t-03dqQYm5g4

Unescaping won't work, I think. As mentioned, cannot read the file past the first word.
Phil Pendlebury - Linktree

User avatar
Grovkillen
Automation Wizard
Posts: 1115
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: Dealing with odd characters

Post by Grovkillen » Mon Sep 09, 2024 12:05 pm

Maybe this will help you?

Code: Select all

Let>FILE_PATH=C:\..\..\..\RockPack.cpr.txt
Let>BYTES_TO_ANALYSE=10
Let>TEMP_ps1_file=%SCRIPT_DIR%\temp_output.ps1
DeleteFile>TEMP_ps1_file
LabelToVar>ps_read_file_on_byte_level,TEMP_PS_CODE
WriteLn>TEMP_ps1_file,,TEMP_PS_CODE

Let>RP_CAPTURESTDOUT=1
Let>CMD_STRING=cmd /c PowerShell -ExecutionPolicy Bypass -File "%TEMP_ps1_file%"
Run>CMD_STRING
Let>RP_CAPTURESTDOUT=0
DeleteFile>TEMP_ps1_file
Trim>RP_STDOUT,RP_STDOUT
Separate>RP_STDOUT,SPACE,ARRAY_OF_CHARACTERS
**BREAKPOINT**


/*
ps_read_file_on_byte_level:
# Function to read file and display bytes in hexadecimal format
function Get-FileBytes {
    param (
        [string]$FilePath,
        [int]$MaxBytes
    )

    # Open the file as a binary stream
    $fileStream = [System.IO.File]::OpenRead($FilePath)
    $binaryReader = New-Object System.IO.BinaryReader($fileStream)
    
    $byteList = @()  # Array to store the bytes
    
    try {
        # Read bytes from the file
        for ($i = 0; $i -lt $MaxBytes; $i++) {
            $byte = $binaryReader.ReadByte()
            $hexByte = '{0:X2}' -f $byte  # Convert byte to hex
            $byteList += $hexByte
        }
    }
    catch {
        Write-Host "Reached end of file or encountered error."
    }
    finally {
        # Close the file stream
        $binaryReader.Close()
        $fileStream.Close()
    }
    
    return $byteList  # Return the list of hex bytes
}

# Specify the file path and number of bytes to read
$file = "%FILE_PATH%"
$maxBytes = %BYTES_TO_ANALYSE%

# Call the function
$byteData = Get-FileBytes -FilePath $file -MaxBytes $maxBytes

# Output the result
$byteData -join ' '  # Join bytes with space for display
*/
Save script and change these accordingly:

Let>FILE_PATH=C:\..\..\..\RockPack.cpr.txt
Let>BYTES_TO_ANALYSE=10
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Grovkillen
Automation Wizard
Posts: 1115
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: Dealing with odd characters

Post by Grovkillen » Mon Sep 09, 2024 12:19 pm

Alternative, for UTF-8 character table:

Code: Select all

Let>FILE_PATH=C:\..\..\..\RockPack.cpr.txt
Let>CHARACTERS_TO_ANALYSE=10
Let>TEMP_ps1_file=%SCRIPT_DIR%\temp_output.ps1
DeleteFile>TEMP_ps1_file
LabelToVar>ps_read_file_on_unicode_level,TEMP_PS_CODE
WriteLn>TEMP_ps1_file,,TEMP_PS_CODE

Let>RP_CAPTURESTDOUT=1
Let>CMD_STRING=cmd /c PowerShell -ExecutionPolicy Bypass -File "%TEMP_ps1_file%"
Run>CMD_STRING
Let>RP_CAPTURESTDOUT=0
DeleteFile>TEMP_ps1_file
Trim>RP_STDOUT,RP_STDOUT
Separate>RP_STDOUT,CRLF,ARRAY_OF_CHARACTERS
**BREAKPOINT**

/*
ps_read_file_on_unicode_level:
# Function to read a file and display the Unicode code point of each character
function Get-FileUnicode {
    param (
        [string]$FilePath,
        [int]$MaxChars
    )

    # Read the file as text with UTF-8 encoding
    $content = Get-Content -Path $FilePath -Raw -Encoding UTF8
    
    $unicodeList = @()  # Array to store the characters and their Unicode code points
    
    # Loop through the characters and limit to the specified number of characters
    for ($i = 0; $i -lt [Math]::Min($MaxChars, $content.Length); $i++) {
        $char = $content[$i]
        $unicodeValue = [int][char]$char  # Get the Unicode code point (decimal)
        $hexValue = '{0:X4}' -f $unicodeValue  # Convert to hexadecimal format
        $unicodeList += "$char : U+$hexValue"
    }
    
    return $unicodeList  # Return the list of characters and their Unicode code points
}

# Specify the file path and number of characters to read
$file = "%FILE_PATH%"
$maxChars = %CHARACTERS_TO_ANALYSE%

# Call the function and get the Unicode data
$unicodeData = Get-FileUnicode -FilePath $file -MaxChars $maxChars

# Output the result
$unicodeData
*/
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Phil Pendlebury
Automation Wizard
Posts: 543
Joined: Tue Jan 16, 2007 9:00 am
Contact:

Re: Dealing with odd characters

Post by Phil Pendlebury » Mon Sep 09, 2024 12:44 pm

Wow that is ingenious and way beyond my knowledge level.

However, the first script did seem to gather useful results. Looks like a bunch of Hex characters. So I guess I would have to convert them somehow. (I am looking into this now)

I set

Code: Select all

Let>BYTES_TO_ANALYSE=100

Code: Select all

0: ARRAY_OF_CHARACTERS_1=52
0: ARRAY_OF_CHARACTERS_2=49
0: ARRAY_OF_CHARACTERS_3=46
... up until 
0: ARRAY_OF_CHARACTERS_11=4E
etc. up to 100 lines
.

The second script (UTF) seemed to only gather a shorter amount of characters.

I am not sure how to continue but I will keep at it.

Thank you.
Phil Pendlebury - Linktree

User avatar
Grovkillen
Automation Wizard
Posts: 1115
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: Dealing with odd characters

Post by Grovkillen » Mon Sep 09, 2024 1:39 pm

You may have better results writing the output to a file:

Code: Select all

Let>FILE_PATH=C:\Users\jimmy.westberg\Downloads\RockPack.cpr.txt
Let>CHARACTERS_TO_ANALYSE=10
Let>TEMP_ps1_file=%SCRIPT_DIR%\temp_output.ps1
Let>TEMP_output_file=%SCRIPT_DIR%\temp_output.txt
DeleteFile>TEMP_ps1_file
DeleteFile>TEMP_output_file
LabelToVar>ps_read_file_on_unicode_level,TEMP_PS_CODE
WriteLn>TEMP_ps1_file,,TEMP_PS_CODE

Let>CMD_STRING=cmd /c PowerShell -ExecutionPolicy Bypass -File "%TEMP_ps1_file%"
Run>CMD_STRING
**BREAKPOINT**

ReadFile>TEMP_output_file,TEMP_contents
Trim>TEMP_contents,TEMP_contents
Separate>TEMP_contents,CRLF,ARRAY_OF_CHARACTERS
DeleteFile>TEMP_ps1_file
DeleteFile>TEMP_output_file

/*
ps_read_file_on_unicode_level:
# Function to read a file and display the Unicode code point of each character
function Get-FileUnicode {
    param (
        [string]$FilePath,
        [int]$MaxChars
    )

    # Read the file as text with UTF-8 encoding
    $content = Get-Content -Path $FilePath -Raw -Encoding UTF8
    
    $unicodeList = @()  # Array to store the characters and their Unicode code points
    
    # Loop through the characters and limit to the specified number of characters
    for ($i = 0; $i -lt [Math]::Min($MaxChars, $content.Length); $i++) {
        $char = $content[$i]
        $unicodeValue = [int][char]$char  # Get the Unicode code point (decimal)
        $hexValue = '{0:X4}' -f $unicodeValue  # Convert to hexadecimal format
        $unicodeList += "$char : U+$hexValue"
    }
    
    return $unicodeList  # Return the list of characters and their Unicode code points
}

# Specify the file path and number of characters to read
$file = "%FILE_PATH%"
$maxChars = %CHARACTERS_TO_ANALYSE%

# Call the function and get the Unicode data
$unicodeData = Get-FileUnicode -FilePath $file -MaxChars $maxChars

# Output the result
$unicodeData | Out-File -FilePath "%TEMP_output_file%"
*/
I suspect the RP_STDOUT is written to a file using ANSI...
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Phil Pendlebury
Automation Wizard
Posts: 543
Joined: Tue Jan 16, 2007 9:00 am
Contact:

Re: Dealing with odd characters

Post by Phil Pendlebury » Mon Sep 09, 2024 6:37 pm

Thanks, also ingenious but still giving me NUL Characters and stopping after RIFF. I will keep looking at the code you provided though thanks.
Phil Pendlebury - Linktree

User avatar
Grovkillen
Automation Wizard
Posts: 1115
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: Dealing with odd characters

Post by Grovkillen » Mon Sep 09, 2024 7:46 pm

But then they are null characters? Reading the file on a binary level is as low as you can go.
Let>ME=%Script%

Running: 15.0.27
version history

User avatar
Phil Pendlebury
Automation Wizard
Posts: 543
Joined: Tue Jan 16, 2007 9:00 am
Contact:

Re: Dealing with odd characters

Post by Phil Pendlebury » Wed Sep 11, 2024 4:35 am

Grovkillen wrote:
Mon Sep 09, 2024 7:46 pm
But then they are null characters? Reading the file on a binary level is as low as you can go.
OK thank you. I am slightly out of my comfort zone with this, but that is a good thing, it is nice to learn new things. I will keep at it.
Phil Pendlebury - Linktree

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts