Dealing with odd characters
Moderators: Dorian (MJT support), JRL
- Phil Pendlebury
- Automation Wizard
- Posts: 543
- Joined: Tue Jan 16, 2007 9:00 am
- Contact:
Dealing with odd characters
Hi all,
I am trying to read a file and I can only get the first word, and even that doesn't match when I try to use it in and IF statement.
I think this is due to special characters in the file which in Notepad++ for example they show as
NUL
SUB
NULNULNULNUL
etc.
Trying to paste any of the file here shows a garbled mess. Completely different to what show sin NP++
Any thoughts on this?
I am trying to read a file and I can only get the first word, and even that doesn't match when I try to use it in and IF statement.
I think this is due to special characters in the file which in Notepad++ for example they show as
NUL
SUB
NULNULNULNUL
etc.
Trying to paste any of the file here shows a garbled mess. Completely different to what show sin NP++
Any thoughts on this?
Phil Pendlebury - Linktree
- Grovkillen
- Automation Wizard
- Posts: 1131
- Joined: Fri Aug 10, 2012 2:38 pm
- Location: Bräcke, Sweden
- Contact:
Re: Dealing with odd characters
Can't you upload some example to an online service? Hard to tell without looking at the characters themselves.
Or just try to escape them.
viewtopic.php?f=8&t=10911&p=46412&hilit=Encode#p46602
Or just try to escape them.
viewtopic.php?f=8&t=10911&p=46412&hilit=Encode#p46602
- Phil Pendlebury
- Automation Wizard
- Posts: 543
- Joined: Tue Jan 16, 2007 9:00 am
- Contact:
Re: Dealing with odd characters
Sure. Here you go:
https://we.tl/t-03dqQYm5g4
Unescaping won't work, I think. As mentioned, cannot read the file past the first word.
https://we.tl/t-03dqQYm5g4
Unescaping won't work, I think. As mentioned, cannot read the file past the first word.
Phil Pendlebury - Linktree
- Grovkillen
- Automation Wizard
- Posts: 1131
- Joined: Fri Aug 10, 2012 2:38 pm
- Location: Bräcke, Sweden
- Contact:
Re: Dealing with odd characters
Maybe this will help you?
Save script and change these accordingly:
Let>FILE_PATH=C:\..\..\..\RockPack.cpr.txt
Let>BYTES_TO_ANALYSE=10
Code: Select all
Let>FILE_PATH=C:\..\..\..\RockPack.cpr.txt
Let>BYTES_TO_ANALYSE=10
Let>TEMP_ps1_file=%SCRIPT_DIR%\temp_output.ps1
DeleteFile>TEMP_ps1_file
LabelToVar>ps_read_file_on_byte_level,TEMP_PS_CODE
WriteLn>TEMP_ps1_file,,TEMP_PS_CODE
Let>RP_CAPTURESTDOUT=1
Let>CMD_STRING=cmd /c PowerShell -ExecutionPolicy Bypass -File "%TEMP_ps1_file%"
Run>CMD_STRING
Let>RP_CAPTURESTDOUT=0
DeleteFile>TEMP_ps1_file
Trim>RP_STDOUT,RP_STDOUT
Separate>RP_STDOUT,SPACE,ARRAY_OF_CHARACTERS
**BREAKPOINT**
/*
ps_read_file_on_byte_level:
# Function to read file and display bytes in hexadecimal format
function Get-FileBytes {
param (
[string]$FilePath,
[int]$MaxBytes
)
# Open the file as a binary stream
$fileStream = [System.IO.File]::OpenRead($FilePath)
$binaryReader = New-Object System.IO.BinaryReader($fileStream)
$byteList = @() # Array to store the bytes
try {
# Read bytes from the file
for ($i = 0; $i -lt $MaxBytes; $i++) {
$byte = $binaryReader.ReadByte()
$hexByte = '{0:X2}' -f $byte # Convert byte to hex
$byteList += $hexByte
}
}
catch {
Write-Host "Reached end of file or encountered error."
}
finally {
# Close the file stream
$binaryReader.Close()
$fileStream.Close()
}
return $byteList # Return the list of hex bytes
}
# Specify the file path and number of bytes to read
$file = "%FILE_PATH%"
$maxBytes = %BYTES_TO_ANALYSE%
# Call the function
$byteData = Get-FileBytes -FilePath $file -MaxBytes $maxBytes
# Output the result
$byteData -join ' ' # Join bytes with space for display
*/
Let>FILE_PATH=C:\..\..\..\RockPack.cpr.txt
Let>BYTES_TO_ANALYSE=10
- Grovkillen
- Automation Wizard
- Posts: 1131
- Joined: Fri Aug 10, 2012 2:38 pm
- Location: Bräcke, Sweden
- Contact:
Re: Dealing with odd characters
Alternative, for UTF-8 character table:
Code: Select all
Let>FILE_PATH=C:\..\..\..\RockPack.cpr.txt
Let>CHARACTERS_TO_ANALYSE=10
Let>TEMP_ps1_file=%SCRIPT_DIR%\temp_output.ps1
DeleteFile>TEMP_ps1_file
LabelToVar>ps_read_file_on_unicode_level,TEMP_PS_CODE
WriteLn>TEMP_ps1_file,,TEMP_PS_CODE
Let>RP_CAPTURESTDOUT=1
Let>CMD_STRING=cmd /c PowerShell -ExecutionPolicy Bypass -File "%TEMP_ps1_file%"
Run>CMD_STRING
Let>RP_CAPTURESTDOUT=0
DeleteFile>TEMP_ps1_file
Trim>RP_STDOUT,RP_STDOUT
Separate>RP_STDOUT,CRLF,ARRAY_OF_CHARACTERS
**BREAKPOINT**
/*
ps_read_file_on_unicode_level:
# Function to read a file and display the Unicode code point of each character
function Get-FileUnicode {
param (
[string]$FilePath,
[int]$MaxChars
)
# Read the file as text with UTF-8 encoding
$content = Get-Content -Path $FilePath -Raw -Encoding UTF8
$unicodeList = @() # Array to store the characters and their Unicode code points
# Loop through the characters and limit to the specified number of characters
for ($i = 0; $i -lt [Math]::Min($MaxChars, $content.Length); $i++) {
$char = $content[$i]
$unicodeValue = [int][char]$char # Get the Unicode code point (decimal)
$hexValue = '{0:X4}' -f $unicodeValue # Convert to hexadecimal format
$unicodeList += "$char : U+$hexValue"
}
return $unicodeList # Return the list of characters and their Unicode code points
}
# Specify the file path and number of characters to read
$file = "%FILE_PATH%"
$maxChars = %CHARACTERS_TO_ANALYSE%
# Call the function and get the Unicode data
$unicodeData = Get-FileUnicode -FilePath $file -MaxChars $maxChars
# Output the result
$unicodeData
*/
- Phil Pendlebury
- Automation Wizard
- Posts: 543
- Joined: Tue Jan 16, 2007 9:00 am
- Contact:
Re: Dealing with odd characters
Wow that is ingenious and way beyond my knowledge level.
However, the first script did seem to gather useful results. Looks like a bunch of Hex characters. So I guess I would have to convert them somehow. (I am looking into this now)
I set
.
The second script (UTF) seemed to only gather a shorter amount of characters.
I am not sure how to continue but I will keep at it.
Thank you.
However, the first script did seem to gather useful results. Looks like a bunch of Hex characters. So I guess I would have to convert them somehow. (I am looking into this now)
I set
Code: Select all
Let>BYTES_TO_ANALYSE=100
Code: Select all
0: ARRAY_OF_CHARACTERS_1=52
0: ARRAY_OF_CHARACTERS_2=49
0: ARRAY_OF_CHARACTERS_3=46
... up until
0: ARRAY_OF_CHARACTERS_11=4E
etc. up to 100 lines
The second script (UTF) seemed to only gather a shorter amount of characters.
I am not sure how to continue but I will keep at it.
Thank you.
Phil Pendlebury - Linktree
- Grovkillen
- Automation Wizard
- Posts: 1131
- Joined: Fri Aug 10, 2012 2:38 pm
- Location: Bräcke, Sweden
- Contact:
Re: Dealing with odd characters
You may have better results writing the output to a file:
I suspect the RP_STDOUT is written to a file using ANSI...
Code: Select all
Let>FILE_PATH=C:\Users\jimmy.westberg\Downloads\RockPack.cpr.txt
Let>CHARACTERS_TO_ANALYSE=10
Let>TEMP_ps1_file=%SCRIPT_DIR%\temp_output.ps1
Let>TEMP_output_file=%SCRIPT_DIR%\temp_output.txt
DeleteFile>TEMP_ps1_file
DeleteFile>TEMP_output_file
LabelToVar>ps_read_file_on_unicode_level,TEMP_PS_CODE
WriteLn>TEMP_ps1_file,,TEMP_PS_CODE
Let>CMD_STRING=cmd /c PowerShell -ExecutionPolicy Bypass -File "%TEMP_ps1_file%"
Run>CMD_STRING
**BREAKPOINT**
ReadFile>TEMP_output_file,TEMP_contents
Trim>TEMP_contents,TEMP_contents
Separate>TEMP_contents,CRLF,ARRAY_OF_CHARACTERS
DeleteFile>TEMP_ps1_file
DeleteFile>TEMP_output_file
/*
ps_read_file_on_unicode_level:
# Function to read a file and display the Unicode code point of each character
function Get-FileUnicode {
param (
[string]$FilePath,
[int]$MaxChars
)
# Read the file as text with UTF-8 encoding
$content = Get-Content -Path $FilePath -Raw -Encoding UTF8
$unicodeList = @() # Array to store the characters and their Unicode code points
# Loop through the characters and limit to the specified number of characters
for ($i = 0; $i -lt [Math]::Min($MaxChars, $content.Length); $i++) {
$char = $content[$i]
$unicodeValue = [int][char]$char # Get the Unicode code point (decimal)
$hexValue = '{0:X4}' -f $unicodeValue # Convert to hexadecimal format
$unicodeList += "$char : U+$hexValue"
}
return $unicodeList # Return the list of characters and their Unicode code points
}
# Specify the file path and number of characters to read
$file = "%FILE_PATH%"
$maxChars = %CHARACTERS_TO_ANALYSE%
# Call the function and get the Unicode data
$unicodeData = Get-FileUnicode -FilePath $file -MaxChars $maxChars
# Output the result
$unicodeData | Out-File -FilePath "%TEMP_output_file%"
*/
- Phil Pendlebury
- Automation Wizard
- Posts: 543
- Joined: Tue Jan 16, 2007 9:00 am
- Contact:
Re: Dealing with odd characters
Thanks, also ingenious but still giving me NUL Characters and stopping after RIFF. I will keep looking at the code you provided though thanks.
Phil Pendlebury - Linktree
- Grovkillen
- Automation Wizard
- Posts: 1131
- Joined: Fri Aug 10, 2012 2:38 pm
- Location: Bräcke, Sweden
- Contact:
Re: Dealing with odd characters
But then they are null characters? Reading the file on a binary level is as low as you can go.
- Phil Pendlebury
- Automation Wizard
- Posts: 543
- Joined: Tue Jan 16, 2007 9:00 am
- Contact:
Re: Dealing with odd characters
OK thank you. I am slightly out of my comfort zone with this, but that is a good thing, it is nice to learn new things. I will keep at it.Grovkillen wrote: ↑Mon Sep 09, 2024 7:46 pmBut then they are null characters? Reading the file on a binary level is as low as you can go.
Phil Pendlebury - Linktree